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Preface 



This is an introduction to linear algebra. The main part of the book features row operations and 
everything is done in terms of the row reduced echelon form and specific algorithms. At the end, the 
more abstract notions of vector spaces and linear transformations on vector spaces are presented. 
However, this is intended to be a first course in linear algebra for students who are sophomores 
or juniors who have had a course in one variable calculus and a reasonable background in college 
algebra. I have given complete proofs of all the fundamental ideas, but some topics such as Markov 
matrices are not complete in this book but receive a plausible introduction. The book contains a 
complete treatment of determinants and a simple proof of the Cayley Hamilton theorem although 
these are optional topics. The Jordan form is presented as an appendix. I see this theorem as the 
beginning of more advanced topics in linear algebra and not really part of a beginning linear algebra 
course. There are extensions of many of the topics of this book in my on line book [11]. I have also 
not emphasized that linear algebra can be carried out with any field although there is an optional 
section on this topic, most of the book being devoted to either the real numbers or the complex 
numbers. It seems to me this is a reasonable specialization for a first course in linear algebra. 

Linear algebra is a wonderful interesting subject. It is a shame when it degenerates into nothing 
more than a challenge to do the arithmetic correctly. It seems to me that the use of a computer 
algebra system can be a great help in avoiding this sort of tedium. I don't want to over emphasize 
the use of technology, which is easy to do if you are not careful, but there are certain standard 
things which are best done by the computer. Some of these include the row reduced echelon form, 
PLU factorization, and QR factorization. It is much more fun to let the machine do the tedious 
calculations than to suffer with them yourself. However, it is not good when the use of the computer 
algebra system degenerates into simply asking it for the answer without understanding what the 
oracular software is doing. With this in mind, there are a few interactive links which explain how 
to use a computer algebra system to accomplish some of these more tedious standard tasks. These 
are obtained by clicking on the symbol ►. I have included how to do it using maple and scientific 
notebook because these are the two systems I am familiar with and have on my computer. Other 
systems could be featured as well. It is expected that people will use such computer algebra systems 
to do the exercises in this book whenever it would be helpful to do so, rather than wasting huge 
amounts of time doing computations by hand. However, this is not a book on numerical analysis so 
no effort is made to consider many important numerical analysis issues. 
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The reader should be familiar with most of the topics in this chapter. However, it is often the case 
that set notation is not familiar and so a short discussion of this is included first. Complex numbers 
are then considered in somewhat more detail. Many of the applications of linear algebra require the 
use of complex numbers, so this is the reason for this introduction. 

1.1 Sets And Set Notation 

A set is just a collection of things called elements. Often these are also referred to as points in 
calculus. For example {1, 2, 3, 8} would be a set consisting of the elements 1,2,3, and 8. To indicate 
that 3 is an element of {1, 2, 3, 8} , it is customary to write 3 G {1, 2, 3, 8} . 9 ^ {1, 2, 3, 8} means 9 is 
not an element of {1, 2, 3, 8} . Sometimes a rule specifies a set. For example you could specify a set 
as all integers larger than 2. This would be written &s S = {x e Z : x > 2} . This notation says: the 
set of all integers, x, such that x > 2. 

If A and B are sets with the property that every element of A is an element of £?, then A is a subset 
of B. For example, {1, 2, 3, 8} is a subset of {1, 2, 3, 4, 5, 8} , in symbols, {1, 2, 3, 8} C {1, 2, 3, 4, 5, 8} . 
It is sometimes said that U A is contained in B" or even "B contains A" . The same statement about 
the two sets may also be written as {1, 2, 3, 4, 5, 8} 2 {1, 2, 3, 8}. 

The union of two sets is the set consisting of everything which is an element of at least one of 
the sets, A or B. As an example of the union of two sets {1, 2, 3, 8} U {3, 4, 7, 8} = {1, 2, 3, 4, 7, 8} 
because these numbers are those which are in at least one of the two sets. In general 

AU B = {x : x e A or x e B} . 

Be sure you understand that something which is in both A and B is in the union. It is not an 
exclusive or. 

The intersection of two sets, A and B consists of everything which is in both of the sets. Thus 
{1, 2, 3, 8} fl {3, 4, 7, 8} = {3, 8} because 3 and 8 are those elements the two sets have in common. In 
general, 

AnB = {x : x G A and x G B} . 

The symbol [a, b] where a and b are real numbers, denotes the set of real numbers x, such that 
a < x < b and [a, b) denotes the set of real numbers such that a < x < b. (a, b) consists of the set of 
real numbers x such that a < x < b and (a, b] indicates the set of numbers x such that a < x < b. 
[a, oo ) means the set of all numbers x such that x > a and (— oo, a] means the set of all real numbers 
which are less than or equal to a. These sorts of sets of real numbers are called intervals. The two 
points a and b are called endpoints of the interval. Other intervals such as (—00,6) are defined by 
analogy to what was just explained. In general, the curved parenthesis indicates the end point it 
sits next to is not included while the square parenthesis indicates this end point is included. The 
reason that there will always be a curved parenthesis next to 00 or —00 is that these are not real 
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numbers. Therefore, they cannot be included in any set of real numbers. 

A special set which needs to be given a name is the empty set also called the null set, denoted 
by 0. Thus is defined as the set which has no elements in it. Mathematicians like to say the empty 
set is a subset of every set. The reason they say this is that if it were not so, there would have to 
exist a set A, such that has something in it which is not in A. However, has nothing in it and so 
the least intellectual discomfort is achieved by saying C A. 

If A and B are two sets, A \ B denotes the set of things which are in A but not in B. Thus 

A\B = {xeA:x^B}. 

Set notation is used whenever convenient. 

To illustrate the use of this notation relative to intervals consider three examples of inequalities. 
Their solutions will be written in the notation just described. 

Example 1.1.1 Solve the inequality 2x + 4 < x — 8 

x < —12 is the answer. This is written in terms of an interval as (— oo, —12]. 

Example 1.1.2 Solve the inequality (x + 1) (2x — 3) > 0. 
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The solution is x < — 1 or x > -. In terms of set notation this is denoted by (— oo, — 1] U [-, oo). 

Example 1.1.3 Solve the inequality x (x + 2) > —4. 

This is true for any value of x. It is written as R or (— oo, oo) . 

1.2 Functions 

The concept of a function is that of something which gives a unique output for a given input. 

Definition 1.2.1 Consider two sets, D and R along with a rule which assigns a unique element 
of R to every element of D. This rule is called a function and it is denoted by a letter such as f. 
Given x G D, f (x) is the name of the thing in R which results from doing f to x. Then D is called 
the domain of f. In order to specify that D pertains to f, the notation D (/) may be used. The 
set R is sometimes called the range of f. These days it is referred to as the codomain. The set of 
all elements of R which are of the form f (x) for some x G D is therefore, a subset of R. This is 
sometimes referred to as the image of f . When this set equals i?, the function f is said to be onto, 
also surjective. If whenever x ^ y it follows f (x) ^ f (y), the function is called one to one. 
, also infective It is common notation to write f : D \-> R to denote the situation just described 
in this definition where f is a function defined on a domain D which has values in a codomain R. 

Sometimes you may also see something like D \-> R to denote the same thing. 

Example 1.2.2 Let D consist of the set of people who have lived on the earth except for Adam and 
for d G D, let f (d) = the biological father of d. Then f is a function. 

This function is not the sort of thing studied in calculus but it is a function just the same. 

Example 1.2.3 Consider the list of numbers {1, 2, 3, 4, 5, 6, 7} = D. Define a function which assigns 
an element of D to R = {2, 3, 4, 5, 6, 7, 8} by f (x) = x + 1 for each x G D. 

This function is onto because every element of R is the result of doing / to something in D. The 
function is also one to one. This is because if x + 1 = y + 1, then it follows x = y. Thus different 
elements of D must go to different elements of R. 

In this example there was a clearly defined procedure which determined the function. However, 
sometimes there is no discernible procedure which yields a particular function. 

Example 1.2.4 Consider the ordered pairs, (1, 2) , (2, —2) , (8, 3) , (7, 6) and let 

D = {1,2,8,7}, 

the set of first entries in the given set of ordered pairs and let 

i?={2,-2,3,6}, 

the set of second entries, and let f (1) = 2, / (2) = —2, / (8) = 3, and f (7) = 6. 

This specifies a function even though it does not come from a convenient formula. 
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1.3 Graphs Of Functions 

Recall the notion of the Cartesian coordinate system you probably saw earlier. It involved an x axis, 
a y axis, two lines which intersect each other at right angles and one identifies a point by specifying a 
pair of numbers. For example, the number (2, 3) involves going 2 units to the right on the x axis and 
then 3 units directly up on a line perpendicular to the x axis. For example, consider the following 
picture. 

y 

(2,3) 



Because of the simple correspondence between points in the plane and the coordinates of a point 
in the plane, it is often the case that people are a little sloppy in referring to these things. Thus, it 
is common to see (x, y) referred to as a point in the plane. In terms of relations, if you graph the 
points as just described, you will have a way of visualizing the relation. 

The reader has likely encountered the notion of graphing relations of the form y = 2x + 3 or 
y = x 2 + 5. The meaning of such an expression in terms of defining a relation is as follows. The 
relation determined by the equation y = 2x + 3 means the set of all ordered pairs (x, y) which are 
related by this formula. Thus the relation can be written as 

{(x,y) :y = 2x + 3}. 
The relation determined by y = x 2 + 5 is 

{(x,y) : y = x 2 + 5}. 

Note that these relations are also functions. For the first, you could let f (x) = 2x + 3 and this would 
tell you a rule which tells what the function does to x. However, some relations are not functions. 
For example, you could consider x 2 + y 2 = 1. Written more formally, the relation it defines is 

{(x,y) : x 2 +y 2 = 1} 

Now if you give a value for x, there might be two values for y which are associated with the given 
value for x. In fact 

y = v 1 — x 2 

Thus this relation would not be a function. 

Recall how to graph a relation. You first found lots of ordered pairs which satisfied the relation. 
For example (0,3), (1,5), and ( — 1,1) all satisfy y = 2x + 3 which describes a straight line. Then 
you connected them with a curve. Here are some simple examples which you should see that you 
understand. First here is the graph of y = x 2 + 1. 
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Now here is the graph of the relation y = 2x + 1 which is a straight line. 
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Sometimes a relation is defined using different formulas depending on the location of one of the 
variables. For example, consider 

x < -2 

-2 <x < 3 

x > 3 

Then the graph of this relation is sketched below. 
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A very important type of relation is one of the form y — yo = m(x — xo), where m, xo, and yo are 
numbers. The reason this is important is that if there are two points, (xi,y±) , and (£2,2/2) which 
satisfy this relation, then 

V\ ~V2 = (yi - 2/o) - (V2 - 2/o) = t rn(x 1 - x ) -m(x 2 - x ) 

xi - x 2 xi- x 2 x\ - x 2 

m(x 1 - x 2 ) 

= — = m. 

x\ - x 2 

Remember from high school, the slope of the line segment through two points is always the difference 
in the y values divided by the difference in the x values, taken in the same order. Sometimes this is 
referred to as the rise divided by the run. This shows that there is a constant slope m, the slope of 
the line, between any pair of points satisfying this relation. Such a relation is called a straight line. 
Also, the point (#0,2/0) satisfies the relation. This is more often called the equation of the straight 
line. 
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Geometrically, this means the graph of the relation is a straight line because the slope between 
any two points is always the same. 

Example 1.3.1 Find the relation for a straight line which contains the point (1, 2) and has constant 
slope equal to 3. 

From the above discussion, (y — 2) = 3 (x — 1) . 

Definition 1.3.2 Let f : D (/) i-> R(f) be a function. The graph of f consists of the set 

{0, y)'-y = f(x) forxeD (/)} . 

Note that knowledge of the graph of a function is equivalent to knowledge of the function. To 
find / (x) , simply observe the ordered pair which has x as its first element and the value of y equals 
fix). 
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x\ h 




Here is the graph of the function, / (x) = x 2 — 2 

y 




1.4 The Complex Numbers 

Recall that a real number is a point on the real number line. Just as a real number should be 
considered as a point on the line, a complex number is considered a point in the plane which can 
be identified in the usual way using the Cartesian coordinates of the point. Thus (a, b) identifies a 
point whose x coordinate is a and whose y coordinate is b. In dealing with complex numbers, such 
a point is written as a + ib. For example, in the following picture, I have graphed the point 3 + 2i. 
You see it corresponds to the point in the plane whose coordinates are (3, 2) . 



.3 + 2* 



,•2 _ 



and 



Multiplication and addition are defined in the most obvious way subject to the convention that 



-1. Thus, 



(a + ib) + (c + id) = (a + c) + i (b + d) 



(a + ib) (c + id) = ac + iad + ibc + i 2 bd 

= (ac — bd) + i(bc + ad) . 

Every non zero complex number a + ib, with a 2 + b 2 ^ 0, has a unique multiplicative inverse. 



1 



a — ib 



a + ib a 2 + b 2 a 2 + b 2 a 2 + b 2 ' 
You should prove the following theorem. 
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Theorem 1.4.1 The complex numbers with multiplication and addition defined as above form a 
field satisfying all the field axioms. These are the following list of properties. 

1. x + y = y + x, (commutative law for addition) 

2. x + = x, (additive identity). 

3. For each xGM, there exists —x E M such that x + (—x) = 0, (existence of additive inverse). 

4. (x + y) + z = x + (y + z) , (associative law for addition). 

5. xy = yx, (commutative law for multiplication). You could write this asxxy = yxx. 

6. (xy) z = x (yz) , (associative law for multiplication). 

7. lx = x, (multiplicative identity). 

8. For each x^O, there exists x~ x such that xx~ x = 1. (existence of multiplicative inverse). 

9. x (y + z) — xy + xz. (distributive law). 

Something which satisfies these axioms is called a field. Linear algebra is all about fields, although 
in this book, the field of most interest will be the field of complex numbers or the field of real numbers. 
You have seen in earlier courses that the real numbers also satisfies the above axioms. The field 
of complex numbers is denoted as C and the field of real numbers is denoted as R. An important 
construction regarding complex numbers is the complex conjugate denoted by a horizontal line above 
the number. It is defined as follows. 

a + ib = a — ib. 

What it does is reflect a given complex number across the x axis. Algebraically, the following formula 
is easy to obtain. 



(a + ib) (a + ib) = (a — ib) (a + ib) 

= a 2 + b 2 -i(ab- ab) = a 2 + b 2 . 

Definition 1.4.2 Define the absolute value of a complex number as follows. 

\a + ib\ = Va 2 + b 2 . 
Thus, denoting by z the complex number z = a + ib, 

\z\ = (zzf 2 . 
Also from the definition, if z = x + iy and w = u + iv are two complex numbers, then 

\zw\ = \z\ \w\ . 

You should verify this. ► 

The triangle inequality holds for the absolute value for complex numbers just as it does for the 
ordinary absolute value. 

Proposition 1.4.3 Let z,w be complex numbers. Then the triangle inequality holds. 

\z + w\<\z\ + \w\, \\z\ - \w\\ < \z - w\ . 
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Proof: Let z = x + iy and w = u + iv. First note that 

zw = (x + iy) (u — iv) = xu + yv + z (yit — aw) 
and so |xu + 2/v| < |zItJ| = \z\ \w\ . 

\z-\-w\ = (x + u + i(y + v)) (x + u — i(y + v)) 



= (x + u) -\- (y -\- v) — x 2 + u 2 + 2xii + 2yv + y 2 + v 2 
<N 2 + |«;| 2 + 2|z|| U ;| = (H + | U ;|) 2 , 
so this shows the first version of the triangle inequality. To get the second, 



z = z — w -\- w, w = w — z + z 



and so by the first form of the inequality 

\z\ < \z — w\ + I it; I , \w\ < \z — w\ + |z| 

and so both |z| — |it;| and |it;| — \z\ are no larger than \z — w\ and this proves the second version 
because \\z\ — \w\\ is one of \z\ — \w\ or \w\ — \z\. ■ 

With this definition, it is important to note the following. Be sure to verify this. It is not too 
hard but you need to do it. 

Remark 1.4.4 ; Let z = a + ib and w — c + id. Then \z — w\ = y (a — c) + (b — d) . Thus the 
distance between the point in the plane determined by the ordered pair (a, b) and the ordered pair 
(c, d) equals \z — w\ where z and w are as just described. 

For example, consider the distance between (2,5) and (1,8). From the distance formula this 



distance equals y (2 — 1) + (5 — 8) = \/T0. On the other hand, letting z = 2 + i5 and w = 1 + i8, 

z — w = 1 — i3 and so (z — w) (z — w) — (1 — z3) (1 + z3) = 10 so \z — w\ — VTO, the same thing 
obtained with the distance formula. 
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1.5 Polar Form Of Complex Numbers 

Complex numbers, are often written in the so called polar form which is described next. Suppose 
z = x + iy is a complex number. Then 



x + iy = y/x 2 + y 2 



x . y 

\Jx 2 + y 2 yx 2 + y 2 



Now note that 



and so 



y^x 2 + y 2 J \ \/x 2 + y 2 



\/ x 2 + y 2 \Jx 2 + 7/ 2 
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is a point on the unit circle. Therefore, there exists a unique angle 8 6 [0, 2i\) such that 

X v 

cos 6 = — ; , sin 6 = 



^x 2 + y 2 ' ^x 2 + y 2 

The polar form of the complex number is then 

r (cos + z sin 0) 



where # is this angle just described and r — ^/x 2 + y 2 = |z|. 

x + iy = r(cos(6) + z sin(0)) 



= v^ 2 + ?/ 2 



7> 



1.6 Roots Of Complex Numbers 

A fundamental identity is the formula of De Moivre which follows. 
Theorem 1.6.1 Let r > be given. Then if n is a positive integer, 

[r (cos t + i sin t)] n = r n (cos nt + i sin rub) . 
Proof: It is clear the formula holds if n = 1. Suppose it is true for n. 

[r (cos t + i sin t)] n = [r (cos £ + i sin t)] n [r (cos t + z sin £)] 
which by induction equals 

_ r n+l ^ cog n ^ _|_ ^ g ^ n n ^ ^ cog ^ _|_ ^ g ^ n ^ 
_ r n+l ^ cog n i cog £ _ gi n n i g^ n ^ _|_ l (gi n n ^ cog £ _j_ cog n ^ g^ n ^ j 

= r n+1 (cos (n + 1) t + i sin (n + 1) t) 
by the formulas for the cosine and sine of the sum of two angles. ■ 

Corollary 1.6.2 Let z be a non zero complex number. Then there are always exactly k k th roots of 

z in C. 

Proof: Let z — x + iy and let z — \z\ (cost + i sint) be the polar form of the complex number. 
By De Moivre's theorem, a complex number 

r (cos a + zsina) , 

is a k th root of z if and only if 

r k (cos ka + z sin to) = \z\ (cost + zsint) . 

This requires r k = \z\ and so r = \z\ ' and also both cos (ka) = cost and sin (ka) = sint. This can 
only happen if 

ka — t + 21tt 

for I an integer. Thus 

t + 2l7T , ^ 

a= ,Z G Z 



Download free eBooks at bookboon.com 

21 



Elementary Linear Algebra Some Prerequisite Topics 



and so the k th roots of z are of the form 



Since the cosine and sine are periodic of period 27r, there are exactly k distinct numbers which result 
from this formula. ■ 

Example 1.6.3 Find the three cube roots of i. 

First note that i = 1 ( cos ( — J + i sin ( — ) ) . Using the formula in the proof of the above corollary, 
the cube roots of i are 

1 / /(7r/2) + 2Z7r\ , . . / (tt/2) + 2Ztt 

1 I cos I ) + 1 sin ' 

where I = 0, 1, 2. Therefore, the roots are 



and 



cos (|) +isin(|) .cosQtt) + ^in ^ , 



' 3 \ ■ • f 3 

COS I -7T I + ZSin I -7T ) . 



Thus the cube roots of i are — - + i I - j , — h i I - ) , and —i. 

The ability to find k th roots can also be used to factor some polynomials. 

Example 1.6.4 Factor the polynomial x 3 — 27. 

First find the cube roots of 27. By the above procedure using De Moivre's theorem, these cube 
roots are 3, 3 — — h i—p- , and 3 — z— - . Therefore, x 3 + 27 = 




Note also \ x — 3 h z — x — 3 i — = x + 3x + 9 and so 



x 3 - 27 = - 3) (x 2 + 3x + 9) 

where the quadratic polynomial x 2 + 3x + 9 cannot be factored without using complex numbers. 
Note that even though the polynomial x 3 — 27 has all real coefficients, it has some complex zeros, 

\-i — and i — . These zeros are complex conjugates of each other. It is always this way. 

You should show this is the case. To see how to do this, see Problems 13 and 14 below. 

Another fact for your information is the fundamental theorem of algebra. This theorem says 
that any polynomial of degree at least 1 having any complex coefficients always has a root in C 
This is sometimes referred to by saying C is algebraically complete. Gauss is usually credited with 
giving a proof of this theorem in 1797 but many others worked on it and the first completely correct 
proof was due to Argand in 1806. For more on this theorem, you can google fundamental theorem 
of algebra and look at the interesting Wikipedia article on it. Proofs of this theorem usually involve 
the use of techniques from calculus even though it is really a result in algebra. 
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1.7 The Quadratic Formula 

The quadratic formula 



-b ± Vb 2 - 4,ac 

x — ~ 

2a 

gives the solutions x to 

ax 2 + bx + c = 

where a, 6, c are real numbers. It holds even if b 2 — Aac < 0. This is easy to show from the above. 
There are exactly two square roots to this number b 2 — 4ac from the above methods using De Moivre's 
theorem. These roots are of the form 

V 4ac — b 2 ( cos ( — J + i sin ( — J J = i y 4ac — b 2 
and 

3tt\ . . /37T 



V 4ac — 6 2 ( cos ( — ) + 2 sin ( — ) ) = — i y 4ac — 6 2 

Thus the solutions, according to the quadratic formula are still given correctly by the above formula. 
Do these solutions predicted by the quadratic formula continue to solve the quadratic equation? 
Yes, they do. You only need to observe that when you square a square root of a complex number z, 
you recover z. Thus 



-b + Vb 2 - 4ac \ , / -b + V^ 2 - 4ac 



2a / \ 2a 



a( ^ 2 -^-2a^^ 2 - 4aC 



, / -6 + V^ 2 - 4ac 

+ M — ^ — 



[b\[W^Iac + 2ac-V 

2a V 

-— ( b\fb 2 - Aac - V 
2a V 



Similar reasoning shows directly that ~ b ^ b a ~ 4ac also solves the quadratic equation. 

What if the coefficients of the quadratic equation are actually complex numbers? Does the 
formula hold even in this case? The answer is yes. This is a hint on how to do Problem 23 below, a 
special case of the fundamental theorem of algebra, and an ingredient in the proof of some versions 
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of this theorem. 

Example 1.7.1 Find the solutions to x 2 — 2ix — 5 = 0. 
Formally, from the quadratic formula, these solutions are 



2% ± V-4 + 20 2% ± 4 

x = = = i ± 2. 

2 2 

Now you can check that these really do solve the equation. In general, this will be the case. See 
Problem 23 below. 



1.8 Exercises 

1. Let z = 5 + z9. Find z' 1 . 

2. Let z — 2 + il and let w — 3 — iS. Find zw, z + w, z 2 , and w/z. 

3. Give the complete solution to x 4 + 16 = 0. 
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4. Graph the complex cube roots of 8 in the complex plane. Do the same for the four fourth 
roots of 16. ► 

5. If z is a complex number, show there exists uj a complex number with \uj\ = 1 and ujz = \z\ . 

6. De Moivre's theorem says [r (cost + is'mt)] n = r n (cos nt + i sin ni) for n a positive integer. 
Does this formula continue to hold for all integers n, even negative integers? Explain. ► 

7. You already know formulas for cos (x + y) and sin (x + y) and these were used to prove De 
Moivre's theorem. Now using De Moivre's theorem, derive a formula for sin (5x) and one for 

cos(5x). ► 

8. If z and w are two complex numbers and the polar form of z involves the angle while the 
polar form of w involves the angle (/>, show that in the polar form for zw the angle involved is 
+ (j). Also, show that in the polar form of a complex number z, r = \z\ . 

9. Factor x 3 + 8 as a product of linear factors. 

10. Write x 3 + 27 in the form (x + 3) (x 2 + ax + b) where x 2 -\-ax-\-b cannot be factored any more 
using only real numbers. 

11. Completely factor x 4 + 16 as a product of linear factors. 

12. Factor x 4 + 16 as the product of two quadratic polynomials each of which cannot be factored 
further without using complex numbers. 

13. If z, w are complex numb ers prove ~zw — zw and then show by induction that z\ • • • z m = 
~z~i' • •~z^ j . Also verify that J^fcLi z k — EfcLi %k- ^ n words this says the conjugate of a prod- 
uct equals the product of the conjugates and the conjugate of a sum equals the sum of the 
conjugates. 

14. Suppose p (x) = a n x n + a n _ix n_1 -\ \- a\x + clq where all the a^ are real numbers. Suppose 

also that p (z) = for some z G C. Show it follows that p (z) — also. 

15. Show that 1 + i, 2 + i are the only two zeros to 

p (x) = x 2 - (3 + 2i) x + (1 + 3z) 
so the zeros do not necessarily come in conjugate pairs if the coefficients are not real. 

16. I claim that 1 = — 1. Here is why. 



-l = i 2 = v^iv^I = V (- 1 ) 2 = vT = i. 

This is clearly a remarkable result but is there something wrong with it? If so, what is wrong? 

17. De Moivre's theorem is really a grand thing. I plan to use it now for rational exponents, not 
just integers. 

1 = l (1/4) = (cos 2tt + i sin 2tt) 1/4 = cos (tt/2) + i sin (tt/2) = i. 

Therefore, squaring both sides it follows 1 = —1 as in the previous problem. What does this 
tell you about De Moivre's theorem? Is there a profound difference between raising numbers 
to integer powers and raising numbers to non integer powers? 

18. Review Problem 6 at this point. Now here is another question: If n is an integer, is it always 
true that (cos — i sin 0) n = cos (nO) — i sin (nQ)7 Explain. 
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19. Suppose you have any polynomial in cos# and sin#. By this I mean an expression of the 
form Yl™=o S/3=o a &P cosa & sm ^ ^ where a a p G C. Can this always be written in the form 

E7=-(n+m) h l COS 7# + Eri-(n+m) C r SU1 T0? Explain. 

20. Suppose p (x) = a n x n + a n -\x n ~ x -\ h aix + ao is a polynomial and it has n zeros, 

listed according to multiplicity, (z is a root of multiplicity ra if the polynomial / (x) = (x — z) m 
divides p (x) but (x — z) f (x) does not.) Show that 

p (x) = a n (x- z\) (x - z 2 ) • • • (x - z n ) . 

21. Give the solutions to the following quadratic equations having real coefficients. 

(a) x 2 - 2x + 2 = 

(b) 3x 2 +x + 3 = 

(c) x 2 - 6x + 13 = 

(d) x 2 + 4r + 9 = 

(e) Ax 2 + 4x + 5 = 

22. Give the solutions to the following quadratic equations having complex coefficients. Note how 
the solutions do not come in conjugate pairs as they do when the equation has real coefficients. 

(a) x 2 + 2x + 1 + i = 

(b) 4x 2 + 4ix - 5 = 

(c) 4x 2 + (4 + 4i) x + 1 + 2z = 

(d) x 2 - Aix - 5 = 

(e) 3x 2 + (1 - i) x + 3z = 

23. Prove the fundamental theorem of algebra for quadratic polynomials having coefficients in C. 
That is, show that an equation of the form ax 2 + bx + c = where a, 6, c are complex numbers, 
a /0 has a complex solution. Hint: Consider the fact, noted earlier that the expressions 
given from the quadratic formula do in fact serve as solutions. 
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The notation, C n refers to the collection of ordered lists of n complex numbers. Since every real 
number is also a complex number, this simply generalizes the usual notion of R n , the collection of 
all ordered lists of n real numbers. In order to avoid worrying about whether it is real or complex 
numbers which are being referred to, the symbol F will be used. If it is not clear, always pick C 

Definition 2.0.1 Define F n = {(#1, • • • , x n ) : Xj £ F for j = 1, • • • , n} . 

Oi,--- ,x n ) = (2/1," ' ,2/n) 

if and only if for all j = 1, • • • , n, Xj = yj. When (x±, • • • ,x n ) £ F n , it is conventional to denote 
(xi,'" ,x n ) by the single boldface letter, x. The numbers, Xj are called the coordinates. Elements 
in F n are called vectors. The set 

{(0,-.-,0,*,0,.-.,0):*€R} 

for t in the i th slot is called the i th coordinate axis in the case o/R n . The point = (0, • • • , 0) is 
called the origin. 

Thus (l,2,4i) £ F 3 and (2,l,4i) £ F 3 but (l,2,4i) ^ (2, l,4z) because, even though the same 
numbers are involved, they don't match up. In particular, the first entries are not equal. 

The geometric significance of R n for n < 3 has been encountered already in calculus or in pre- 
calculus. Here is a short review. First consider the case when n = 1. Then from the definition, 
R 1 = R. Recall that R is identified with the points of a line. Look at the number line again. 
Observe that this amounts to identifying a point on this line with a real number. In other words a 
real number determines where you are on this line. Now suppose n = 2 and consider two lines which 
intersect each other at right angles as shown in the following picture. 

- .(2,6) 



6" 
(-8,3). 4-o 

1 



Notice how you can identify a point shown in the plane with the ordered pair, (2, 6) . You go to 
the right a distance of 2 and then up a distance of 6. Similarly, you can identify another point in the 
plane with the ordered pair (—8, 3) . Go to the left a distance of 8 and then up a distance of 3. The 
reason you go to the left is that there is a — sign on the eight. From this reasoning, every ordered 
pair determines a unique point in the plane. Conversely, taking a point in the plane, you could draw 
two lines through the point, one vertical and the other horizontal and determine unique points, x\ 
on the horizontal line in the above picture and X2 on the vertical line in the above picture, such that 
the point of interest is identified with the ordered pair, (#1, #2) • In short, points in the plane can be 
identified with ordered pairs similar to the way that points on the real line are identified with real 
numbers. Now suppose n = 3. As just explained, the first two coordinates determine a point in a 
plane. Letting the third component determine how far up or down you go, depending on whether 
this number is positive or negative, this determines a point in space. Thus, (1,4, —5) would mean 
to determine the point in the plane that goes with (1,4) and then to go below this plane a distance 
of 5 to obtain a unique point in space. You see that the ordered triples correspond to points in 
space just as the ordered pairs correspond to points in a plane and single real numbers correspond 
to points on a line. 
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You can't stop here and say that you are only interested in n < 3. What if you were interested 
in the motion of two objects? You would need three coordinates to describe where the first object 
is and you would need another three coordinates to describe where the other object is located. 
Therefore, you would need to be considering R 6 . If the two objects moved around, you would need 
a time coordinate as well. As another example, consider a hot object which is cooling and suppose 
you want the temperature of this object. How many coordinates would be needed? You would need 
one for the temperature, three for the position of the point in the object and one more for the time. 




www.im^rith-zf 
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Thus you would need to be considering R 5 . Many other examples can be given. Sometimes n is very 
large. This is often the case in applications to business when they are trying to maximize profit 
subject to constraints. It also occurs in numerical analysis when people try to solve hard problems 
on a computer. 

There are other ways to identify points in space with three numbers but the one presented is 
the most basic. In this case, the coordinates are known as Cartesian coordinates after Descartes 1 
who invented this idea in the first half of the seventeenth century. I will often not bother to draw a 
distinction between the point in space and its Cartesian coordinates. 

The geometric significance of C n for n > 1 is not available because each copy of C corresponds 
to the plane or 1R 2 . 

2.1 Algebra in F n 

There are two algebraic operations done with elements of F n . One is addition and the other is 
multiplication by numbers, called scalars. In the case of C n the scalars are complex numbers while 
in the case of R n the only allowed scalars are real numbers. Thus, the scalars always come from F 
in either case. 

Definition 2.1.1 If x G F n and a G F ; also called a scalar, then ax G F n is defined by 

ax = a (#i, • • • , x n ) = (a^i, • • • , ax n ) . (2.1) 

This is known as scalar multiplication. If x, y G F n then x + y G F n and is defined by 

x + y = Oi,--- ,x n ) + (yi,-' ,Vn) 

= (xi H-2/i,' •• ,x n +y n ) (2.2) 

F n is often called n dimensional space. With this definition, vector addition and scalar multipli- 
cation satisfy the conclusions of the following theorem. More generally, these properties are called 
the vector space axioms. 

Theorem 2.1.2 For v, w G F n and a,j3 scalars, (real numbers), the following hold. 

v + w = w + v, (2.3) 

the commutative law of addition, 

(v + w) + z = v+ (w + z) , (2.4) 

the associative law for addition, 

v + = v, (2.5) 

the existence of an additive identity, 

v+(-v) = 0, (2.6) 

the existence of an additive inverse, Also 

a (v + w) = av+aw, (2.7) 

(a + /3) v =av+/3v, (2.8) 

a (/3v) = a/3 (v) , (2.9) 

lv = v. (2.10) 
In the above = (0, • • • , 0). 



-'^Rene Descartes 1596-1650 is often credited with inventing analytic geometry although it seems the ideas were 
actually known much earlier. He was interested in many different subjects, physiology, chemistry, and physics being 
some of them. He also wrote a large book in which he tried to explain the book of Genesis scientifically. Descartes 
ended up dying in Sweden. 
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You should verify these properties all hold. For example, consider 2.7 

a (v + w) = a (vi + w 1 , • • • , v n + w n ) 

= (a (vi + wi) , • • • , a (> n + w n )) 
= (avi + au>i, • • • , cw n + au> n ) 

= (av±, • • • , av n ) + (m^i, • • • , aw n ) =av + aw. 
As usual subtraction is defined as x — y = x+ (— y) . 

2.2 Geometric Meaning Of Vectors 

The geometric meaning is especially significant in the case of R n for n = 2,3. Here is a short 
discussion of this topic. 

Definition 2.2.1 Let x = (#i, • • • , x n ) be the coordinates of a point in R n . Imagine an arrow with 
its tail at = (0, • • • ,0) and its point at x as shown in the following picture in the case ofM 3 . 



(x 1 ,X 2 ,Xs) =X 




Then this arrow is called the position vector of the point x. Given two points, P, Q whose 
coordinates are (pi, • • • ,p n ) and (#i, • • • ,q n ) respectively, one can also determine the position vector 
from P to Q defined as follows. 



Gzi 



■Pw 



Pn) 



Thus every point determines a vector and conversely, every such vector (arrow) which has its tail 
at determines a point of R n , namely the point of W 1 which coincides with the point of the vector. 
Also two different points determine a position vector going from one to the other as just explained. 

Imagine taking the above position vector and moving it around, always keeping it pointing in 
the same direction as shown in the following picture. 




After moving it around, it is regarded as the same vector because it points in the same direction 
and has the same length. 2 Thus each of the arrows in the above picture is regarded as the same 
vector. The components of this vector are the numbers, #!,••• ,x n . You should think of these 
numbers as directions for obtaining an arrow. Starting at some point (ai,a2,--- , a n ) in R n , you 
move to the point (a\ + x±, • • • , a n ) and from there to the point {a\ + xi, ct2 + ^2, as • • • , a n ) and 
then to (ai + £1, <22 + #2, as + £3, • • • , a n ) and continue this way until you obtain the point 

(<2i +Xi,CL2 +X2,'" ,CL n +X n ) . 



2 I will discuss how to define length later. For now, it is only necessary to observe that the length should be defined 
in such a way that it does not change when such motion takes place. 



Download free eBooks at bookboon.com 



30 



Elementary Linear Algebra 



Fn 



The arrow having its tail at (ai, ai, • • • , a n ) an d its point at 

(ai + xi, a 2 H- x 2 , " • , a n + a n ) 

looks just like the arrow which has its tail at and its point at (#i, 
same vector. 



,x n ) so it is regarded as the 



2.3 Geometric Meaning Of Vector Addition 

It was explained earlier that an element of R n is an n tuple of numbers and it was also shown that 
this can be used to determine a point in three dimensional space in the case where n = 3 and in two 
dimensional space, in the case where n = 2. This point was specified relative to some coordinate 
axes. 

Consider the case where n = 3 for now. If you draw an arrow from the point in three dimensional 
space determined by (0, 0, 0) to the point (a, 6, c) with its tail sitting at the point (0, 0, 0) and its 
point at the point (a, 6, c) , this arrow is called the position vector of the point determined by 
u = (a, 6, c) . One way to get to this point is to start at (0, 0, 0) and move in the direction of the 
x\ axis to (a, 0, 0) and then in the direction of the x 2 axis to (a, 6, 0) and finally in the direction 
of the xs axis to (a, 6, c) . It is evident that the same arrow (vector) would result if you began at 
the point v = (d, e, /) , moved in the direction of the x\ axis to (d + a, e, /) , then in the direction 
of the X2 axis to (d + a, e + 6, /) , and finally in the xs direction to (d + a, e + 6, / + c) only this 
time, the arrow would have its tail sitting at the point determined by v = (d, e, /) and its point 
at (d + a, e + 6, / + c) . It is said to be the same arrow (vector) because it will point in the same 
direction and have the same length. It is like you took an actual arrow, the sort of thing you shoot 
with a bow, and moved it from one location to another keeping it pointing the same direction. This is 
illustrated in the following picture in which v + u is illustrated. Note the parallelogram determined 
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in the picture by the vectors u and v. 




Thus the geometric significance of (d, e, /) + (a, 6, c) = (d + a, e + 6, / + c) is this. You start 
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with the position vector of the point (d, e, /) and at its point, you place the vector determined by 
(a, 6, c) with its tail at (d, e, /) . Then the point of this last vector will be (d + a, e + 6, / + c) . This 
is the geometric significance of vector addition. Also, as shown in the picture, u + v is the directed 
diagonal of the parallelogram determined by the two vectors u and v. A similar interpretation holds 
in R n , n > 3 but I can't draw a picture in this case. 

Since the convention is that identical arrows pointing in the same direction represent the same 
vector, the geometric significance of vector addition is as follows in any number of dimensions. 

Procedure 2.3.1 Let u and v be two vectors. Slide v so that the tail of v is on the point of u. 
Then draw the arrow which goes from the tail of u to the point of the slid vector v. This arrow 
represents the vector u + v. 




Note that P+P$ = Q. 



2.4 Distance Between Points In W 1 Length Of A Vector 



How is distance between two points in R n defined? 

Definition 2.4.1 Let x = (xi, • • • , x n ) and y = {y\ 
indicates the distance between these points and is 



,y n ) be two points in 

as 



Then |x — y| to 



1/2 



distance between x 

This is called the distance formula. Thus |x| = 

S(a,r) = {x e 



^2\ x k 



Vk\ 



\k=l / 

£ — 0| . The symbol, B (a, r) is 

: |x — a| < r} . 



by 



This is called an open ball of radius r centered at a. It means all points in W 1 which are closer to 
a than r. The length of a vector x is the distance between x and 0. 

First of all, note this is a generalization of the notion of distance in R. There the distance between 
two points, x and y was given by the absolute value of their difference. Thus \x — y\ is equal to 

/ 2 \l/2 

the distance between these two points on R. Now \x — y\ — f (x — y) ) where the square root is 

always the positive square root. Thus it is the same formula as the above definition except there is 
only one term in the sum. Geometrically, this is the right way to define distance which is seen from 
the Pythagorean theorem. Often people use two lines to denote this distance, ||x — y||. However, I 
want to emphasize this is really just like the absolute value. Also, the notation I am using is fairly 
standard. 

Consider the following picture in the case that n = 2. 

(2/1,2/2) 



Oi,x 2 ) 




(2/1^2) 
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There are two points in the plane whose Cartesian coordinates are (xi,x 2 ) and (2/1,2/2) respec- 
tively. Then the solid line joining these two points is the hypotenuse of a right triangle which is 
half of the rectangle shown in dotted lines. What is its length? Note the lengths of the sides of this 
triangle are \yi — xi\ and \y 2 — x 2 \ . Therefore, the Pythagorean theorem implies the length of the 
hypotenuse equals 



(l2/i - x i\ 2 + 1 2/2 - x 2 \ 2 J = ((2/1 - xxf + (y 2 - x 2 fj 



1/2 



which is just the formula for the distance given above. In other words, this distance defined above 
is the same as the distance of plane geometry in which the Pythagorean theorem holds. 

Now suppose n = 3 and let (xi, x 2 , £3) and (2/1,2/2, 2/3) be two points in M 3 . Consider the following 
picture in which one of the solid lines joins the two points and a dotted line joins the points (xi, x 2 , x%) 
and (2/1,2/2,^3). 

l, 2/2, 2/3) 



(x 1 ,x 2 ,x 3 y 



• 



1 (2/1, 2/2, x 3 ) 



(yi,x 2 ,x 3 ) 



By the Pythagorean theorem, the length of the dotted line joining (#1, x 2 ,xs) and (2/1,2/2,^3) 
equals 

/ 2 2\ 1 ^ 2 

(J2/1 - Xi) + (2/2 - ff 2 ) J 

while the length of the line joining (2/1,2/2,^3) to (2/1,2/2,2/3) is J us ^ 1 2/3 ~ x z\ • Therefore, by the 
Pythagorean theorem again, the length of the line joining the points (xi 1 x 2l x 3 ) and (2/1,2/2,2/3) 
equals 



1/2" 



((2/1 -^i) 2 + (2/2 ~x 2 fj 

((2/1 - xxf + (2/2 - x 2 f + (2/3 - x 3 ) 2 J 




which is again just the distance formula above. 

This completes the argument that the above definition is reasonable. Of course you cannot 
continue drawing pictures in ever higher dimensions but there is no problem with the formula for 
distance in any number of dimensions. Here is an example. 

Example 2.4.2 Find the distance between the points in M 4 , 

a = (1,2, -4, 6) 

and 

b = (2,3,-l,0) 

Use the distance formula and write 

|a - b| 2 = (1 - 2) 2 + (2 - 3) 2 + (-4 - (-1)) 2 + (6 - 0) 2 = 47 
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Therefore, |a - b| = V47. 

All this amounts to defining the distance between two points as the length of a straight line 
joining these two points. However, there is nothing sacred about using straight lines. One could 
define the distance to be the length of some other sort of line joining these points. It won't be done 
in this book but sometimes this sort of thing is done. 

Another convention which is usually followed, especially in R 2 and R 3 is to denote the first 
component of a point in R 2 by x and the second component by y. In R 3 it is customary to denote 
the first and second components as just described while the third component is called z. 

Example 2.4.3 Describe the points which are at the same distance between (1,2,3) and (0, 1,2) . 

Let (x,y,z) be such a point. Then 



y/(x - if + {y- 2) 2 + (z- 3) 2 = yfe + (y - l) 2 + (z - 2) 2 . 



Squaring both sides 

and so 

x 2 

which implies 
and so 



(x ~ I) 2 + (V - 2) 2 + (z- 3) 2 = x 2 + (y - l) 2 + (z- 2) 2 

- 2x + 14 + y 2 - 4y + z 2 - 6z = x 2 + y 2 - 2y + 5 + z 2 - 
-2x + U-4y-6z = -2y + 5 - 4z 



4z 



2x + 2y + 2z 



-9. 



(2.11) 



Since these steps are reversible, the set of points which is at the same distance from the two given 

points consists of the points, (x,y,z) such that 2.11 holds. 

There are certain properties of the distance which are obvious. Two of them which follow directly 
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from the definition are 

|x-y| = |y-x|, 

|x — y| > and equals only if y = x. 

The third fundamental property of distance is known as the triangle inequality. Recall that in any 
triangle the sum of the lengths of two sides is always at least as large as the third side. I will show 
you a proof of this later. This is usually stated as 

|x + y|<|x| + |y|. 

Here is a picture which illustrates the statement of this inequality in terms of geometry. 
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2.5 Geometric Meaning Of Scalar Multiplication 

As discussed earlier, x = (#1, #2, #3) determines a vector. You draw the line from to x placing the 
point of the vector on x. What is the length of this vector? The length of this vector is defined to 
equal |x| as in Definition 2.4.1. Thus the length of x equals \/x 2 + x\ + x\. When you multiply x 
by a scalar a, you get (ax\, ax 2l axs) and the length of this vector is defined as 



J Uaxi) + (ax 2 ) 2 + (axs) 2 ) = \a\ J x\ +x\+ x\. 

Thus the following holds. 

|ax| = |a||x|. 

In other words, multiplication by a scalar magnifies the length of the vector. What about the 
direction? You should convince yourself by drawing a picture that if a is negative, it causes the 
resulting vector to point in the opposite direction while if a > it preserves the direction the vector 
points. 

You can think of vectors as quantities which have direction and magnitude, little arrows. Thus 
any two little arrows which have the same length and point in the same direction are considered to 
be the same vector even if their tails are at different points. 




You can always slide such an arrow and place its tail at the origin. If the resulting point of the 
vector is (a, 6, c) , it is clear the length of the little arrow is \fa 2 + b 2 + c 2 . Geometrically, the way 
you add two geometric vectors is to place the tail of one on the point of the other and then to form 
the vector which results by starting with the tail of the first and ending with this point as illustrated 
in the following picture. Also when (a, 6, c) is referred to as a vector, you mean any of the arrows 
which have the same direction and magnitude as the position vector of this point. Geometrically, 
for u = (/ui, 1^2, ^3) , cm is any of the little arrows which have the same direction and magnitude as 
(cmi, av,2, aus) . 




The following example is art which illustrates these definitions and conventions. 
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Exercise 2.5.1 Here is a picture of two vectors, u and v. 




Sketch a picture of u + v, u — v, and u+2v. 

First here is a picture of u + v. You first draw u and then at the point of u you place the tail of 
v as shown. Then u + v is the vector which results which is drawn in the following pretty picture. 




Next consider u — v. This means u+ (— v) . From the above geometric description of vector 
addition, — v is the vector which has the same length but which points in the opposite direction to 
v. Here is a picture. 



u+(-v) 




Finally consider the vector u+2v. Here is a picture of this one also. 

2v 




u + 2v 

2.6 Exercises 

1. Verify all the properties 2.3-2.10. 

2. Compute 5(1,2 + 3i, 3, -2) + 6 (2 - i, 1, -2, 7) . 

3. Draw a picture of the points in R 2 which are determined by the following ordered pairs. 

(a) (1,2) 

(b) (-2,-2) 

(c) (-2,3) 

(d) (2,-5) 

4. Does it make sense to write (1, 2) + (2, 3, 1)? Explain. 

5. Draw a picture of the points in R 3 which are determined by the following ordered triples. 
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(a) (1,2,0) 

(b) (-2,-2,1) 

(c) (-2,3,-2) 

2.7 Vectors And Physics 

Suppose you push on something. What is important? There are really two things which are impor- 
tant, how hard you push and the direction you push. This illustrates the concept of force. 

Definition 2.7.1 Force is a vector. The magnitude of this vector is a measure of how hard it is 
pushing. It is measured in units such as Newtons or pounds or tons. Its direction is the direction in 
which the push is taking place. 

Vectors are used to model force and other physical vectors like velocity. What was just described 
would be called a force vector. It has two essential ingredients, its magnitude and its direction. 
Geometrically think of vectors as directed line segments or arrows as shown in the following picture 
in which all the directed line segments are considered to be the same vector because they have the 
same direction, the direction in which the arrows point, and the same magnitude (length). 




Because of this fact that only direction and magnitude are important, it is always possible to put 
a vector in a certain particularly simple form. Let pq be a directed line segment or vector. Then it 
follows that pq consists of the points of the form 

P + t (q - p) 

where t G [0, 1] . Subtract p from all these points to obtain the directed line segment consisting of 
the points 

OH-t(q-p), te [0,1]. 

The point in R n , q — p, will represent the vector. 
Geometrically, the arrow, pq, was slid so it points in the same direction and the base is at the 
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origin, 0. For example, see the following picture. 




In this way vectors can be identified with points of M n . 

Definition 2.7.2 Let x = (xi,--- , x n ) E W 1 . The position vector of this point is the vector 

whose point is at x and whose tail is at the origin, (0, • • • ,0). Ifx. = (#i, • • • , x n ) is called a vector, 
the vector which is meant is this position vector just described. Another term associated with this is 
standard position. A vector is in standard position if the tail is placed at the origin. 

It is customary to identify the point in R n with its position vector. 
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The magnitude of a vector determined by a directed line segment pq is just the distance between 
the point p and the point q. By the distance formula this equals 



1/2 



^2(Qk -Pkf 
\k=l / 



IP- Ql 



v. 



1 /9 

and for v any vector in R n the magnitude of v equals (Ylk=i V V) 

Example 2.7.3 Consider the vector v = (1, 2, 3) in R n . Find |v| . 

First, the vector is the directed line segment (arrow) which has its base at = (0,0,0) and its 
point at (1,2,3) . Therefore, 

|v| = \J\ 2 + 2 2 + 3 2 = VT4. 

What is the geometric significance of scalar multiplication? If a represents the vector v in the 
sense that when it is slid to place its tail at the origin, the element of R n at its point is a, what is 

rv? 

/ » \ 1/2 / n \ V2 

2 \ 



h= E(™*) 2 = E r2 K) 



\k=l 

^V2 / y- r 2 
\k=l 



\k=l 



1/2 



(r 2 r E«? 



r v . 



Thus the magnitude of rv equals |r| times the magnitude of v. If r is positive, then the vector 
represented by rv has the same direction as the vector v because multiplying by the scalar r, only 
has the effect of scaling all the distances. Thus the unit distance along any coordinate axis now has 
length r and in this rescaled system the vector is represented by a. If r < similar considerations 
apply except in this case all the ai also change sign. From now on, a will be referred to as a vector 
instead of an element of R n representing a vector as just described. The following picture illustrates 
the effect of scalar multiplication. 



-2v 



Note there are n special vectors which point along the coordinate axes. These are 

ei = (0,... ,0,1,0,.-. ,0) 

where the 1 is in the i th slot and there are zeros in all the other spaces. See the picture in the case 

ofR 3 . 





e2 y 
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The direction of e^ is referred to as the i th direction. Given a vector v = (ai, • • • , a n ) , a^e^ is 
the i th component of the vector. Thus 

di^i = (0, ••• ,0,a;,0, ••• ,0) 

and so this vector gives something possibly nonzero only in the i th direction. Also, knowledge of 
the i th component of the vector is equivalent to knowledge of the vector because it gives the entry 
in the i th slot and for v = (ai, • • • , a n ) , 

n 

v = y y^a i e i . 
k=i 

What does addition of vectors mean physically? Suppose two forces are applied to some object. 
Each of these would be represented by a force vector and the two forces acting together would yield an 
overall force acting on the object which would also be a force vector known as the resultant. Suppose 
the two vectors are a = Ylk=i a i e i anc ^ ^ = Ylk=i ^ e *- Then the vector a involves a component in 
the I th direction, a^ while the component in the i th direction of b is 6^e^. Then it seems physically 
reasonable that the resultant vector should have a component in the i th direction equal to (ai + bi) e^. 
This is exactly what is obtained when the vectors, a and b are added. 

a + b = (a 1 +6i,--- ,a n + 6 n ) • 

n 

= ^2(ai + bi)ei. 

2=1 

Thus the addition of vectors according to the rules of addition in R n which were presented earlier, 
yields the appropriate vector which duplicates the cumulative effect of all the vectors in the sum. 
What is the geometric significance of vector addition? Suppose u, v are vectors, 

u = Oi,--- ,u n ),v =0i,--- ,v n ) 

Then u + v = (u\ + i>i, • • • , u n + v n ) . How can one obtain this geometrically? Consider the directed 
line segment, Ou and then, starting at the end of this directed line segment, follow the directed line 



segment u (u + v) to its end, u + v. In other words, place the vector u in standard position with its 
base at the origin and then slide the vector v till its base coincides with the point of u. The point 
of this slid vector determines u + v. To illustrate, see the following picture 




Note the vector u + v is the diagonal of a parallelogram determined from the two vectors u and 
v and that identifying u + v with the directed diagonal of the parallelogram determined by the 
vectors u and v amounts to the same thing as the above procedure. 

An item of notation should be mentioned here. In the case of R n where n < 3, it is standard 
notation to use i for ei, j for e2, and k for e3. Now here are some applications of vector addition to 
some problems. 

Example 2.7.4 There are three ropes attached to a car and three people pull on these ropes. The 
first exerts a force o/2i+3j— 2k Newtons, the second exerts a force of 3i+5j + k Newtons and the 
third exerts a force of 5i — j+2k. Newtons. Find the total force in the direction of i. 
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To find the total force add the vectors as described above. This gives 10i+7j + k Newtons. 
Therefore, the force in the i direction is 10 Newtons. 

As mentioned earlier, the Newton is a unit of force like pounds. 

Example 2.7.5 An airplane flies North East at 100 miles per hour. Write this as a vector. 

A picture of this situation follows. 




The vector has length 100. Now using that vector as the hypotenuse of a right triangle having 
equal sides, the sides should be each of length IOO/a/2- Therefore, the vector would be lOO/V^i + 

100/V2J. 

This example also motivates the concept of velocity. 

Definition 2.7.6 The speed of an object is a measure of how fast it is going. It is measured in 
units of length per unit time. For example, miles per hour, kilometers per minute, feet per second. 
The velocity is a vector having the speed as the magnitude but also specifying the direction. 

Thus the velocity vector in the above example is lOO/V^i + 100/\/2j. 

Example 2.7.7 The velocity of an airplane is lOOi + j + k measured in kilometers per hour and 
at a certain instant of time its position is (1,2, 1) . Here imagine a Cartesian coordinate system in 
which the third component is altitude and the first and second components are measured on a line 
from West to East and a line from South to North. Find the position of this airplane one minute 
later. 

Consider the vector (1,2,1), is the initial position vector of the airplane. As it moves, the 
position vector changes. After one minute the airplane has moved in the i direction a distance of 
100 x ^ = | kilometer. In the j direction it has moved ^ kilometer during this same time, while it 
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moves ^r kilometer in the k direction. Therefore, the new displacement vector for the airplane is 



(1,2,1) 



5 _!_ Jl 

3'60'60 



8 121 
3'~60~ 



121\ 



Example 2.7.8 A certain river is one half mile wide with a current flowing at 4 miles per hour 
from East to West. A man swims directly toward the opposite shore from the South bank of the 
river at a speed of 3 miles per hour. How far down the river does he find himself when he has swam 
across? How far does he end up swimming? 

Consider the following picture. 



You should write these vectors in terms of components. The velocity of the swimmer in still water 
would be 3j while the velocity of the river would be — 4i. Therefore, the velocity of the swimmer is 
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— 4i + 3j. Since the component of velocity in the direction across the river is 3, it follows the trip 
takes 1/6 hour or 10 minutes. The speed at which he travels is V4 2 + 3 2 = 5 miles per hour and 
so he travels 5 x | = | miles. Now to find the distance downstream he finds himself, note that 
if x is this distance, x and 1/2 are two legs of a right triangle whose hypotenuse equals 5/6 miles. 
Therefore, by the Pythagorean theorem the distance downstream is 



^(5/6) 2 - (1/2) 2 = \ miles. 



3 

2.8 Exercises 

1. The wind blows from West to East at a speed of 50 miles per hour and an airplane which 
travels at 300 miles per hour in still air is heading North West. What is the velocity of the 
airplane relative to the ground? What is the component of this velocity in the direction North? 

2. In the situation of Problem 1 how many degrees to the West of North should the airplane head 
in order to fly exactly North. What will be the speed of the airplane relative to the ground? 

3. In the situation of 2 suppose the airplane uses 34 gallons of fuel every hour at that air speed 
and that it needs to fly North a distance of 600 miles. Will the airplane have enough fuel to 
arrive at its destination given that it has 63 gallons of fuel? 

4. An airplane is flying due north at 150 miles per hour. A wind is pushing the airplane due east 
at 40 miles per hour. After 1 hour, the plane starts flying 30° East of North. Assuming the 
plane starts at (0, 0) , where is it after 2 hours? Let North be the direction of the positive y 
axis and let East be the direction of the positive x axis. 

5. City A is located at the origin while city B is located at (300, 500) where distances are in miles. 
An airplane flies at 250 miles per hour in still air. This airplane wants to fly from city A to 
city B but the wind is blowing in the direction of the positive y axis at a speed of 50 miles per 
hour. Find a unit vector such that if the plane heads in this direction, it will end up at city B 
having flown the shortest possible distance. How long will it take to get there? 

6. A certain river is one half mile wide with a current flowing at 2 miles per hour from East to 
West. A man swims directly toward the opposite shore from the South bank of the river at 
a speed of 3 miles per hour. How far down the river does he find himself when he has swam 
across? How far does he end up swimming? 

7. A certain river is one half mile wide with a current flowing at 2 miles per hour from East to 
West. A man can swim at 3 miles per hour in still water. In what direction should he swim 
in order to travel directly across the river? What would the answer to this problem be if the 
river flowed at 3 miles per hour and the man could swim only at the rate of 2 miles per hour? 

8. Three forces are applied to a point which does not move. Two of the forces are 2i + j + 3k 
Newtons and i — 3j + 2k Newtons. Find the third force. 

9. The total force acting on an object is to be 2i + j + k Newtons. A force of — i + j + k Newtons 
is being applied. What other force should be applied to achieve the desired total force? 

10. A bird flies from its nest 5 km. in the direction 60° north of east where it stops to rest on a 
tree. It then flies 10 km. in the direction due southeast and lands atop a telephone pole. Place 
an xy coordinate system so that the origin is the bird's nest, and the positive x axis points 
east and the positive y axis points north. Find the displacement vector from the nest to the 
telephone pole. 
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11. A car is stuck in the mud. There is a cable stretched tightly from this car to a tree which is 
20 feet long. A person grasps the cable in the middle and pulls with a force of 100 pounds 
perpendicular to the stretched cable. The center of the cable moves two feet and remains still. 
What is the tension in the cable? The tension in the cable is the force exerted on this point 
by the part of the cable nearer the car as well as the force exerted on this point by the part of 
the cable nearer the tree. 
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Vector Products 



3.1 The Dot Product 

There are two ways of multiplying vectors which are of great importance in applications. The first of 
these is called the dot product, also called the scalar product and sometimes the inner product. 

Definition 3.1.1 Let a, b be two vectors in R n define a • b as 

n 

a.-b=^2a k b k . 
k=i 

The dot product a-bis sometimes denoted as (a, b) where a comma replaces •. 

With this definition, there are several important properties satisfied by the dot product. In the 
statement of these properties, a and (3 will denote scalars and a, b, c will denote vectors. 

Proposition 3.1.2 The dot product satisfies the following properties. 

a-b = b-a (3.1) 

a • a > and equals zero if and only if a = (3.2) 

(aa + (3b) • c =a (a • c) + (3 (b • c) (3.3) 

c • (aa + (3b) = a (c • a) + (3 (c • b) (3.4) 

|a| 2 =a-a (3.5) 

You should verify these properties. Also be sure you understand that 3.4 follows from the first 
three and is therefore redundant. It is listed here for the sake of convenience. 

Example 3.1.3 Find (1, 2, 0, -1) • (0, 1, 2, 3) . 

This equals + 2 + + -3 = -1. 

Example 3.1.4 Find the magnitude of a. = (2, 1,4, 2) . That is, find |a| . 

This is ^(2,1,4,2). (2,1,4,2) = 5. 

The dot product satisfies a fundamental inequality known as the Cauchy Schwarz inequality. 

Theorem 3.1.5 The dot product satisfies the inequality 

|a-b|<|a||b|. (3.6) 

Furthermore equality is obtained if and only if one of a or b is a scalar multiple of the other. 
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Proof: First note that if b = both sides of 3.6 equal zero and so the inequality holds in this 
case. Therefore, it will be assumed in what follows that b^O. 
Define a function of t G R 

/(t) = (a + tb)-(a + tb). 

Then by 3.2, / (t) > for all t E R. Also from 3.3,3.4,3.1, and 3.5 

/ (£) = a • (a + tb) + tb • (a + rt>) 

= a • a + t (a • b) + tb ■ a + t 2 b ■ b 

= |a| 2 + 2t(a-b) + |b|V. 

Now this means the graph, y = / (t) is a polynomial which opens up and either its vertex touches 
the t axis or else the entire graph is above the x axis. In the first case, there exists some t where 
/ (t) = and this requires a + tb = so one vector is a multiple of the other. Then clearly equality 
holds in 3.6. In the case where b is not a multiple of a, it follows / (t) > for all t which says / (t) 
has no real zeros and so from the quadratic formula, 

(2(a.b)) 2 -4|a| 2 |b| 2 <0 

which is equivalent to |(a • b)| < |a| |b|. ■ 

You should note that the entire argument was based only on the properties of the dot product 
listed in 3.1 - 3.5. This means that whenever something satisfies these properties, the Cauchy 
Schwarz inequality holds. There are many other instances of these properties besides vectors in R n . 

The Cauchy Schwarz inequality allows a proof of the triangle inequality for distances in R n 
in much the same way as the triangle inequality for the absolute value. 

Theorem 3.1.6 (Triangle inequality) For a, b G R n 

|a + b|<|a| + |b| (3.7) 

and equality holds if and only if one of the vectors is a nonnegative scalar multiple of the other. Also 

||a|-|b||<|a-b| (3.8) 

Proof: By properties of the dot product and the Cauchy Schwarz inequality, 

|a + b| 2 = (a + b)-(a + b) 

= (a • a) + (a • b) + (b • a) + (b • b) 
= |a| 2 + 2(a-b) + |b| 2 
<|a| 2 + 2|a-b| + |b| 2 
<|a| 2 + 2|a||b| + |b| 2 
= (|a| + |b|) 2 . 

Taking square roots of both sides you obtain 3.7. 

It remains to consider when equality occurs. If either vector equals zero, then that vector equals 
zero times the other vector and the claim about when equality occurs is verified. Therefore, it can 
be assumed both vectors are nonzero. To get equality in the second inequality above, Theorem 3.1.5 
implies one of the vectors must be a multiple of the other. Say b = as.. If a < then equality 
cannot occur in the first inequality because in this case 

(a-b) =a|a| 2 <0< |a||a| 2 = |a • b| 

Therefore, a > 0. 
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To get the other form of the triangle inequality, 

a = a — b + b 

so 

|a| = |a-b + b| 
<|a-b| + |b|. 

Therefore, 

|a|-|b|<|a-b| (3.9) 

Similarly, 

|b|-|a| < |b-a| = |a-b|. (3.10) 

It follows from 3.9 and 3.10 that 3.8 holds. This is because ||a| — |b|| equals the left side of either 
3.9 or 3.10 and either way, ||a| — |b|| < |a — b|. ■ 

3.2 The Geometric Significance Of The Dot Product 

3.2.1 The Angle Between Two Vectors 

Given two vectors, a and b, the included angle is the angle between these two vectors which is less 
than or equal to 180 degrees. The dot product can be used to determine the included angle between 
two vectors. To see how to do this, consider the following picture. 



a-tf 




By the law of cosines, 



|a-b| 2 = |a| 2 + |b| 2 -2|a||b|cos6>. 



Also from the properties of the dot product, 

|a-b| 2 = (a-b).(a-b) 

= |a| 2 + |b| 2 -2a-b 

and so comparing the above two formulas, 

a-b= |a| |b|cos0. (3.11) 

In words, the dot product of two vectors equals the product of the magnitude of the two vectors 
multiplied by the cosine of the included angle. Note this gives a geometric description of the dot 
product which does not depend explicitly on the coordinates of the vectors. 

Example 3.2.1 Find the angle between the vectors 2i + j — k and 3i + 4j + k. 
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The dot product of these two vectors equals 6 + 4 — 1 = 9 and the norms are y/A + 1 + 1 = \/E 
^26. Therefore, from 3.11 the cosine of the included angle equals 

9 



and V9 + 16 + 1 



26V6 



. 720 58 



Now the cosine is known, the angle can be determines by solving the equation, cos 6 = . 720 58. 
This will involve using a calculator or a table of trigonometric functions. The answer is = . 766 16 
radians or in terms of degrees, — . 766 16 x ^p = 43. 898°. Recall how this last computation is done. 
Set up a proportion, 76 g 16 = ^ because 360° corresponds to 2tt radians. However, in calculus, you 
should get used to thinking in terms of radians and not degrees. This is because all the important 
calculus formulas are defined in terms of radians. 

Example 3.2.2 Let u, v be two vectors whose magnitudes are equal to 3 and 4 respectively and such 
that if they are placed in standard position with their tails at the origin, the angle between u and the 
positive x axis equals 30° and the angle between v and the positive x axis is -3(f '. Find u • v. 



From the geometric description of the dot product in 3.11 

u • v = 3 x 4 x cos (60°) = 3x4x1/2 



6. 
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Observation 3.2.3 Two vectors are said to be perpendicular if the included angle is 7r/2 radians 
(9(P). You can tell if two nonzero vectors are perpendicular by simply taking their dot product. If 
the answer is zero, this means they are perpendicular because cos = 0. 

Example 3.2.4 Determine whether the two vectors, 2i + j — k and li + 3j + 5k are perpendicular. 

When you take this dot product you get 2 + 3 — 5 = and so these two are indeed perpendicular. 

Definition 3.2.5 When two lines intersect, the angle between the two lines is the smaller of the two 
angles determined. 

Example 3.2.6 Find the angle between the two lines, (1, 2, 0)+£ (1, 2, 3) and (0, 4, — 3)+£ (— 1, 2, —3) . 

These two lines intersect, when t = in the first and t = — 1 in the second. It is only a matter 
of finding the angle between the direction vectors. One angle determined is given by 

cos<9= — = — . (3.12) 

14 7 

We don't want this angle because it is obtuse. The angle desired is the acute angle given by 

a 3 

cose/ = -. 

7 
It is obtained by using replacing one of the direction vectors with —1 times it. 

3.2.2 Work And Projections 

Our first application will be to the concept of work. The physical concept of work does not in any 
way correspond to the notion of work employed in ordinary conversation. For example, if you were 
to slide a 150 pound weight off a table which is three feet high and shuffle along the floor for 50 
yards, sweating profusely and exerting all your strength to keep the weight from falling on your 
feet, keeping the height always three feet and then deposit this weight on another three foot high 
table, the physical concept of work would indicate that the force exerted by your arms did no work 
during this project even though the muscles in your hands and arms would likely be very tired. The 
reason for such an unusual definition is that even though your arms exerted considerable force on 
the weight, enough to keep it from falling, the direction of motion was at right angles to the force 
they exerted. The only part of a force which does work in the sense of physics is the component 
of the force in the direction of motion (This is made more precise below.). The work is defined to 
be the magnitude of the component of this force times the distance over which it acts in the case 
where this component of force points in the direction of motion and (—1) times the magnitude of 
this component times the distance in case the force tends to impede the motion. Thus the work 
done by a force on an object as the object moves from one point to another is a measure of the 
extent to which the force contributes to the motion. This is illustrated in the following picture in 
the case where the given force contributes to the motion. 




In this picture the force, F is applied to an object which moves on the straight line from pi to p2- 
There are two vectors shown, F|| and F^ and the picture is intended to indicate that when you add 
these two vectors you get F while F|| acts in the direction of motion and F^ acts perpendicular to 
the direction of motion. Only Fn contributes to the work done by F on the object as it moves from 
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Pi to p2- Pi I is called the component of the force in the direction of motion. From trigonometry, 
you see the magnitude of F|| should equal |F| |cos#| . Thus, since F|| points in the direction of the 
vector from pi to P2, the total work done should equal 

|F||^^|cos^=|F||p 2 - Pl |cos^ 

If the included angle had been obtuse, then the work done by the force, F on the object would have 
been negative because in this case, the force tends to impede the motion from pi to p 2 but in this 
case, cos# would also be negative and so it is still the case that the work done would be given by 
the above formula. Thus from the geometric description of the dot product given above, the work 
equals 

|F||p2-Pi|cos0 = F.(p 2 -p 1 ). 

This explains the following definition. 

Definition 3.2.7 Let F be a force acting on an object which moves from the point pi to the point 
P2- Then the work done on the object by the given force equals F (p2 — Pi) . 

The concept of writing a given vector F in terms of two vectors, one which is parallel to a 
given vector D and the other which is perpendicular can also be explained with no reliance on 
trigonometry, completely in terms of the algebraic properties of the dot product. As before, this is 
mathematically more significant than any approach involving geometry or trigonometry because it 
extends to more interesting situations. This is done next. 

Theorem 3.2.8 Let F and D be nonzero vectors. Then there exist unique vectors F|| and F± such 
that 

F = F,|+F ± (3.13) 

where F|| is a scalar multiple of~D, also referred to as 

proj D (F) , 

and Fj_ • D = 0. The vector proj D (F) is called the projection ofF onto D. 

Proof: Suppose 3.13 and F|| = aD. Taking the dot product of both sides with D and using 
F ± D = 0, this yields 

F-D = a|D| 2 

which requires a = F • D/ |D| . Thus there can be no more than one vector F||. It follows F^ must 

equal F — F||. This verifies there can be no more than one choice for both F|| and F_l. 

Now let 

F D_ 

F ll = ^ D 

11 |D| 2 

and let 

F D 

F ± = F - Fii = F ^D 

11 |D| 2 

Then F|| = a D where a = KM- It only remains to verify F^ • D = 0. But 

F D 
F ± D = F D ^D D 

|D| 2 

= FD-FD = 0. 
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Example 3.2.9 Let F = 2i+7j — 3k Newtons. Find the work done by this force in moving from 
the point (1,2,3) to the point (—9, —3,4) along the straight line segment joining these points where 
distances are measured in meters. 

According to the definition, this work is 

(2i+7j - 3k) • (— lOi - 5j + k) = -20 + (-35) + (-3) 

= —58 Newton meters. 

Note that if the force had been given in pounds and the distance had been given in feet, the 
units on the work would have been foot pounds. In general, work has units equal to units of a force 
times units of a length. Instead of writing Newton meter, people write joule because a joule is by 
definition a Newton meter. That word is pronounced "jewel" and it is the unit of work in the metric 
system of units. Also be sure you observe that the work done by the force can be negative as in 
the above example. In fact, work can be either positive, negative, or zero. You just have to do the 
computations to find out. 

Example 3.2.10 Find proj u (v) if u = 2i + 3j - 4k and v = i - 2j + k. 

From the above discussion in Theorem 3.2.8, this is just 

1 



4 + 9 + 16 



(i - 2j + k) • (2i + 3j - 4k) (2i + 3j - 4k) 



= — (2i + 3 j — 4k) = i j + — k. 

29 v J } 29 29 J 29 

Example 3.2.11 Suppose a, and b are vectors and b± = b — proj a (b) . What is the magnitude of 
b± in terms of the included angle? 



|bJ 2 = (b-proj a (b))-(b-proj a (b)) 




|b| 2 (l-cos 2 0) = |b| sin 2 (0) 
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where 9 is the included angle between a and b which is less than tt radians. Therefore, taking 
square roots, 

lb i I = |b|sin<9. 



3.2.3 The Inner Product And Distance In C n 

It is necessary to give a generalization of the dot product for vectors in C n . This is often called the 
inner product. It reduces to the definition of the dot product in the case the components of the 
vector are real. 

Definition 3.2.12 Let x, y G C n . Thus x = (xi, • • • ,x n ) where each x^GC and a similar formula 
holding for y. Then the inner product of these two vectors is defined to be 



x y 



-E 



Xjyj = Ziyi 



^nVn 



The inner product is often denoted as (x, y) or (x, y) . 

Notice how you put the conjugate on the entries of the vector y. It makes no difference if the 
vectors happen to be real vectors but with complex vectors you must do it this way. The reason for 
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this is that when you take the inner product of a vector with itself, you want to get the square of 
the length of the vector, a positive number. Placing the conjugate on the components of y in the 
above definition assures this will take place. Thus 



x-x=J2 x J x J =^2\ x j\ 2 ^°- 



If you didn't place a conjugate as in the above definition, things wouldn't work out correctly. For 
example, 

(1 + z) 2 + 2 2 = 4 + 2i 

and this is not a positive number. 

The following properties of the inner product follow immediately from the definition and you 
should verify each of them. 

Properties of the inner product: 

1. u • v = v • u. 

2. If a, b are numbers and u, v, z are vectors then (ou + bv) • z = a (u • z) + b (v • z) . 

3. u • u > and it equals if and only if u = 0. 
Note this implies (x-ay) = a (x • y) because 



(x-ay) = (ay • x) = a (y • x) = a (x • y) 
The norm is defined in the usual way. 
Definition 3.2.13 For x e C n , 

/ n \ 1/2 

W=f£l**l 2 ) = (x-x) 1/2 

Here is a fundamental inequality called the Cauchy Schwarz inequality which is stated here 
in C n . First here is a simple lemma. 

Lemma 3.2.14 If z G C there exists G C such that Oz = \z\ and \0\ = 1. 

Proof: Let 6 = 1 if z = and otherwise, let = — . Recall that for z = x + iy, z = x — iy and 

\A 

zz = \z\ . 

I will give a proof of this important inequality which depends only on the above list of properties 
of the inner product. It will be slightly different than the earlier proof. 

Theorem 3.2.15 (Cauchy Schwarz) The following inequality holds for x and y G C n . 

|(x-y)|<(x-x) 1/2 (yy) 1/2 (3.14) 

Equality holds in this inequality if and only if one vector is a multiple of the other. 

Proof: Let G C such that |0| = 1 and 

0(x-y) = |(x-y)| 
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Consider p (t) = (x + Oty, x + t9y) where t£R. Then from the above list of properties of the dot 
product, 

< pit) = (x-x) + tf(x- y)|tf (yx) + f 2 ( r y) 

= (x • x) + tO (x • y) + *0(x • y) + t 2 (y • y) 

= (x-x) + 2£Re(0(x-y))+i 2 (yy) 

= (x-x) + 2t|(x-y)|+t 2 (yy) (3.15) 

and this must hold for all tEl. Therefore, if (y • y) = it must be the case that |(x • y)| =0 also 
since otherwise the above inequality would be violated. Therefore, in this case, 

|(x.y)|<(x-x) 1/2 (yy) 1/2 . 

On the other hand, if (y • y) /= 0, then p (t) > for all t means the graph of y = p (t) is a parabola 
which opens up and it either has exactly one real zero in the case its vertex touches the t axis or it 
has no real zeros. 



-t 




From the quadratic formula this happens exactly when 

4|(x-y)| 2 -4(x-x)(yy)<0 

which is equivalent to 3.14. 

It is clear from a computation that if one vector is a scalar multiple of the other that equality 
holds in 3.14. Conversely, suppose equality does hold. Then this is equivalent to saying 4 |(x • y)| — 
4 (x • x) (y • y) =0 and so from the quadratic formula, there exists one real zero to p (t) = 0. Call it 
t . Then 

p(t ) = ((x + 5t y)-(x + t 5y)) = |x + %| 2 = o 

and so x = —Ot^y. ■ 

Note that I only used part of the above properties of the inner product. It was not necessary to 
use the one which says that if (x • x) = then x = 0. 

By analogy to the case of R n , length or magnitude of vectors in C n can be defined. 

Definition 3.2.16 Let z e C n . Then |z| = (z • z) 1/2 . 

The conclusions of the following theorem are also called the axioms for a norm. 

Theorem 3.2.17 For length defined in Definition 3.2.16, the following hold. 

|z| > and |z| = if and only ifz — (3.16) 

If a is a scalar, \az\ = \a\ |z| (3.17) 

|z + w| < |z| + |w|. (3.18) 

Proof: The first two claims are left as exercises. To establish the third, you use the same 
argument which was used in M n . 

|z + w| = (z + w,z + w) 

= z • z + w • w + w • z + z • w 

= |z| 2 + |w| 2 +2Rew-z 

< |z| 2 + |w| 2 +2|wz| 

< |z| 2 + |w| 2 +2|w||z| = (|z| + |w|) 2 .B 

Occasionally, I may refer to the inner product in C n as the dot product. They are the same thing 
for W 1 . However, it is convenient to draw a distinction when discussing matrix multiplication a little 
later. 
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3.3 Exercises 

1. Use formula 3.11 to verify the Cauchy Schwarz inequality and to show that equality occurs if 
and only if one of the vectors is a scalar multiple of the other. 

2. For u, v vectors in R 3 , define the product, u * v = u\Vi + 2u^v^ + 3^3^3. Show the axioms for 
a dot product all hold for this funny product. Prove 

|u*v| < (u * u) / (v * v) / . 

Hint: Do not try to do this with methods from trigonometry. 

3. Find the angle between the vectors 3i — j — k and i + 4 j + 2k. 

4. Find the angle between the vectors i — 2 j + k and i + 2 j — 7k. 

5. Find proj u (v) where v = (1, 0, —2) and u = (1, 2, 3) . 

6. Find proj u (v) where v = (1, 2, —2) and u = (1, 0, 3) . 

7. Find proj u (v) where v = (1, 2, -2, 1) and u = (1, 2, 3, 0) . 

8. Does it make sense to speak of proj (v)? 

9. If F is a force and D is a vector, show proj D (F) = (|F| cos#) u where u is the unit vector in 
the direction of D, u = D/ |D| and 6 is the included angle between the two vectors, F and D. 
|F| cos# is sometimes called the component of the force, F in the direction, D. 
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10. Prove the Cauchy Schwarz inequality in R n as follows. For u, v vectors, consider 

(u - proj v u) • (u - proj v u) > 

Now simplify using the axioms of the dot product and then put in the formula for the projection. 
Of course this expression equals and you get equality in the Cauchy Schwarz inequality if 
and only if u = proj v u. What is the geometric meaning of u = proj v u? 

11. A boy drags a sled for 100 feet along the ground by pulling on a rope which is 20 degrees from 
the horizontal with a force of 40 pounds. How much work does this force do? 

12. A girl drags a sled for 200 feet along the ground by pulling on a rope which is 30 degrees from 
the horizontal with a force of 20 pounds. How much work does this force do? 

13. A large dog drags a sled for 300 feet along the ground by pulling on a rope which is 45 degrees 
from the horizontal with a force of 20 pounds. How much work does this force do? 

14. How much work in Newton meters does it take to slide a crate 20 meters along a loading dock 
by pulling on it with a 200 Newton force at an angle of 30° from the horizontal? 
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15. An object moves 10 meters in the direction of j. There are two forces acting on this object, 
Fi = i + j + 2k, and F 2 = — 5i + 2j— 6k. Find the total work done on the object by the two 
forces. Hint: You can take the work done by the resultant of the two forces or you can add 
the work done by each force. Why? 

16. An object moves 10 meters in the direction of j + i. There are two forces acting on this object, 
Fi = i + 2 j + 2k, and ¥2 = 5i + 2j— 6k. Find the total work done on the object by the two 
forces. Hint: You can take the work done by the resultant of the two forces or you can add 
the work done by each force. Why? 

17. An object moves 20 meters in the direction of k + j. There are two forces acting on this object, 
Fi = i+j + 2k, and F2 = i + 2j— 6k. Find the total work done on the object by the two forces. 
Hint: You can take the work done by the resultant of the two forces or you can add the work 
done by each force. 

18. If a, b, c are vectors. Show that (b + c) ± = b^ + c^ where b^ = b— proj a (b) . 

19. Find (1,2, 3, 4) -(2, 0,1, 3). 

20. Show that (a • b) = \ 



■b| 2 -|a-b| 2 



21. Prove from the axioms of the dot product the parallelogram identity, |a + b| -f |a — b| = 

2|a| 2 + 2|b| 2 . 

3.4 The Cross Product 

The cross product is the other way of multiplying two vectors in R 3 . It is very different from the 
dot product in many ways. First the geometric meaning is discussed and then a description in 
terms of coordinates is given. Both descriptions of the cross product are important. The geometric 
description is essential in order to understand the applications to physics and geometry while the 
coordinate description is the only way to practically compute the cross product. 

Definition 3.4.1 Three vectors, a, b, c form a right handed system if when you extend the fingers 
of your right hand along the vector a and close them in the direction 0/ b, the thumb points roughly 
in the direction of c. 

For an example of a right handed system of vectors, see the following picture. 





In this picture the vector c points upwards from the plane determined by the other two vectors. 
You should consider how a right hand system would differ from a left hand system. Try using your 
left hand and you will see that the vector c would need to point in the opposite direction as it would 
for a right hand system. 

From now on, the vectors, i, j, k will always form a right handed system. To repeat, if you extend 
the fingers of your right hand along i and close them in the direction j, the thumb points in the 
direction of k. 
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The following is the geometric description of the cross product. It gives both the direction and 
the magnitude and therefore specifies the vector. 

Definition 3.4.2 Let a and b be two vectors in R 3 . Then axbis defined by the following two 
rules. 

1. |a x b| = |a| |b| sin (9 where is the included angle. 

2. a x b • a = 0, a x b • b = 0, and a, b, a x b forms a right hand system. 
Note that |a x b| is the area of the parallelogram determined by a and b. 




The cross product satisfies the following properties. 

a x b = — (b x a) ,axa = 0, 

For a a scalar, 

(aa) xb = a (a x b) = ax (ah) , 

For a, b, and c vectors, one obtains the distributive laws, 

ax (b + c) = a x b + a x c, 

(b + c)xa = bxa + cxa. 



(3.19) 
(3.20) 

(3.21) 
(3.22) 



Formula 3.19 follows immediately from the definition. The vectors a x b and b x a have the 
same magnitude, |a| |b|sin#, and an application of the right hand rule shows they have opposite 
direction. Formula 3.20 is also fairly clear. If a is a nonnegative scalar, the direction of (a&) xb is 
the same as the direction of a x b,a (a x b) and ax (ah) while the magnitude is just a times the 
magnitude of a x b which is the same as the magnitude of a (a x b) and ax (ah) . Using this yields 
equality in 3.20. In the case where a < 0, everything works the same way except the vectors are all 
pointing in the opposite direction and you must multiply by \a\ when comparing their magnitudes. 
The distributive laws are much harder to establish but the second follows from the first quite easily. 
Thus, assuming the first, and using 3.19, 



(b + c) x a = -ax (b + c) = 
= bxa + cxa. 



■ (a x b + a x c) 



A proof of the distributive law is given in a later section for those who are interested. 
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Now from the definition of the cross product, 

i x j = k j x i = -k 
kxi=j i x k = — j 
j xk = i k x j = — i 

With this information, the following gives the coordinate description of the cross product. 

Proposition 3.4.3 Let a = a\\ + a 2 ] + a 3 k and b = b\\ + 62J + 63k be two vectors. Then 

a x b = (a 2 b 3 - a 3 b 2 ) i+ (a 3 &i - ai& 3 ) j+ 

+ (ai6 2 -a 2 6i)k. (3.23) 

Proof: From the above table and the properties of the cross product listed, 

(aii + a 2 j + a 3 k) x (6ii + b 2 j + b 3 k) = 

aib 2 i x j + aib 3 \ xk + a 2 6ij x i + a 2 b 3 j x k+ 

+a 3 6ik x i + a 3 6 2 k x j 
= ai6 2 k - ai6 3 j - a 2 bik + a 2 6 3 i + a 3 6ij - a 3 6 2 i 
= (a 2 6 3 - a 3 b 2 ) i+ (a 3 6i - ai& 3 ) j+ (ai& 2 - 02^1) k (3.24) 



It is probably impossible for most people to remember 3.23. Fortunately, there is a somewhat 
easier way to remember it. Define the determinant of a 2 x 2 matrix as follows 



a b 
c d 



Then 



a x b 



ad — be 



i j k 

a 1 a 2 a 3 
bi b 2 b 3 



where you expand the determinant along the top row. This yields 



i(-i) 



1+1 


a 2 
b 2 


a 3 
b 3 


+ j( 


-1) 2+1 


a\ a 3 
b\ b 3 


+ k(-l) 3+1 


a\ a 2 
b\ b 2 




— i 


a 2 
b 2 


a 3 
b 3 


-j 


a 
b 


1 as 

L ^3 


+ 


k 


a\ a 2 

bi b 2 







(3.25) 
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Note that to get the scalar which multiplies i you take the determinant of what is left after deleting 
the first row and the first column and multiply by (—1) because i is in the first row and the first 
column. Then you do the same thing for the j and k. In the case of the j there is a minus sign 



because j is in the first row and the second column and so( — 1) 



1+2 



by (-1) 



3+1 



-1 while the k is multiplied 



1. The above equals 



(a 2 b 3 - a 3 b 2 ) i- (aife 3 - a 3 6i) j+ (aib 2 - «2&i) k 



(3.26) 



which is the same as 3.24. There will be much more presented on determinants later. For now, 
consider this an introduction if you have not seen this topic. 

Example 3.4.4 Find (i - j + 2k) x (3i - 2j + k) . 

Use 3.25 to compute this. 



k = 3i + 5j + k. 



i 


j k 




-1 2 




1 2 




1 


-1 


1 


-1 2 


- 


-2 1 


l— 


3 1 


J+ 


3 


-2 


3 


-2 1 
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Example 3.4.5 Find the area of the parallelogram determined by the vectors, 

(i-j + 2k), (3i-2j + k). 
These are the same two vectors in Example 3.4-4- 

From Example 3.4.4 and the geometric description of the cross product, the area is just the norm 



of the vector obtained in Example 3.4.4. Thus the area is V9 + 25 + 1 = y35- 

Example 3.4.6 Find the area of the triangle determined by (1, 2, 3) , (0, 2, 5) , (5, 1, 2) . 

This triangle is obtained by connecting the three points with lines. Picking (1,2,3) as a starting 
point, there are two displacement vectors, (—1, 0, 2) and (4, —1, —1) such that the given vector added 
to these displacement vectors gives the other two vectors. The area of the triangle is half the area of 
the parallelogram determined by (-1, 0, 2) and (4, -1, -1) . Thus (-1, 0, 2) x (4, -1, -1) = (2, 7, 1) 
and so the area of the triangle is \ y/A + 49 + 1 = | VE. 

Observation 3.4.7 In general, if you have three points (vectors) in R 3 ,P,Q,R the area of the 
triangle is given by 

1 |(Q-P)x(R-P)|. 




3.4.1 The Distributive Law For The Cross Product 

This section gives a proof for 3.21, a fairly difficult topic. It is included here for the interested 
student. If you are satisfied with taking the distributive law on faith, it is not necessary to read this 
section. The proof given here is quite clever and follows the one given in [3]. Another approach, 
based on volumes of parallelepipeds is found in [14] and is discussed a little later. 

Lemma 3.4.8 Let b and c be two vectors. Then b x c = b x ci where C|| +c^ = c and c± • b = 0. 

Proof: Consider the following picture. 

c± 



Now c^ = c — c-^t^t and so c^ is in the plane determined by c and b. Therefore, from the 
geometric definition of the cross product, b x c and bxci have the same direction. Now, referring 
to the picture, 

|bxcj = |b||c±| = |b||c|sin<9= |b x c| . 

Therefore, b x c and bxci also have the same magnitude and so they are the same vector. ■ 
With this, the proof of the distributive law is in the following theorem. 

Theorem 3.4.9 Let a, b, and c be vectors in R 3 . Then 

ax(b + c) = axb + axc (3.27) 
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Proof: Suppose first that a • b = a • c = 0. Now imagine a is a vector coming out of the page 
and let b, c and b + c be as shown in the following picture. 

ax (b + c) 




axe 



Then axb,ax(b + c), and axe are each vectors in the same plane, perpendicular to a as 
shown. Thus a x c c = 0, ax (b + c) • (b + c) = 0, and a x b • b = 0. This implies that to get 
a x b you move counterclockwise through an angle of 7r/2 radians from the vector b. Similar 
relationships exist between the vectors ax (b + c) and b + c and the vectors axe and c. Thus the 
angle between a x b and ax (b + c) is the same as the angle between b + c and b and the angle 
between axe and ax (b + c) is the same as the angle between c and b + c. In addition to this, 
since a is perpendicular to these vectors, 



|a x b| = |a| |b| , |ax (b + c)| = |a| |b + c| , and 



a x c 



a c . 



Therefore, 



and so 



|ax(b- 



a x c 



laxbl 



|ax (b + c)| 



|ax(b- 



laxbl 



showing the triangles making up the parallelogram on the right and the four sided figure on the left 
in the above picture are similar. It follows the four sided figure on the left is in fact a parallelogram 
and this implies the diagonal is the vector sum of the vectors on the sides, yielding 3.27. 

Now suppose it is not necessarily the case that a • b = a • c = 0. Then write b = b|| + b^ where 
b^ • a = 0. Similarly c = en + cj_. By the above lemma and what was just shown, 

ax (b + c) = ax (b + c) ± = ax (b_L + c_l) 

= axb^+axc^=axb + axc. ■ 

The result of Problem 18 of the exercises 3.3 is used to go from the first to the second line. 

3.4.2 The Box Product 

Definition 3.4.10 A parallelepiped determined by the three vectors, a, b, and c consists of 

{ra+sb + tc :r,s,te [0,1]}. 

That is, if you pick three numbers, r, s, and t each in [0, 1] and form ra+sb + tc, then the collection 
of all such points is what is meant by the parallelepiped determined by these three vectors. 

The following is a picture of such a thing. 
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You notice the area of the base of the parallelepiped, the parallelogram determined by the vectors, 
a and b has area equal to |a x b| while the altitude of the parallelepiped is |c| cos# where is the 
angle shown in the picture between c and a x b. Therefore, the volume of this parallelepiped is the 
area of the base times the altitude which is just 

|a x b| |c| cos# = a x b • c. 

This expression is known as the box product and is sometimes written as [a, b, c] . You should con- 
sider what happens if you interchange the b with the c or the a with the c. You can see geometrically 
from drawing pictures that this merely introduces a minus sign. In any case the box product of three 
vectors always equals either the volume of the parallelepiped determined by the three vectors or else 
minus this volume. 

Example 3.4.11 Find the volume of the parallelepiped determined by the vectors, i + 2 j — 5k, i + 
3j - 6k,3i + 2j + 3k. 

According to the above discussion, pick any two of these, take the cross product and then take 
the dot product of this with the third of these vectors. The result will be either the desired volume 
or minus the desired volume. 



(i + 2j - 5k) x (i + 3j - 6k) 



i J 

1 2 
1 3 



3i+j + k 



Now take the dot product of this vector with the third which yields 

(3i + j + k) • (3i + 2j + 3k) = 9 + 2 + 3 = 14. 

This shows the volume of this parallelepiped is 14 cubic units. 

There is a fundamental observation which comes directly from the geometric definitions of the 
cross product and the dot product. 

Lemma 3.4.12 Let a, b, and c be vectors. Then (a x b) -c = a- (b x c) . 
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Proof: This follows from observing that either (a x b) -c and a- (b x c) both give the volume of 
the parallelepiped or they both give —1 times the volume. ■ 

Notation 3.4.13 The box product axbc = abxc is denoted more compactly as [a, b, c]. 

3.4.3 A Proof Of The Distributive Law 

Here is another proof of the distributive law for the cross product. Let x be a vector. From the 
above observation, 

x • ax (b + c) = (x x a) • (b + c) 

= (x x a) • b+ (x x a) • c 
= xaxb + xaxc 
= x- (a x b + a x c) . 



Therefore, 



x- [ax (b + c) — (a x b + a x c) 
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for all x. In particular, this holds for x = ax (b + c) — (a x b + a x c) showing that ax (b + c) 
a x b + a x c and this proves the distributive law for the cross product another way. 



Observation 3.4.14 Suppose you have three vectors, u 
Then u • v x w is given by the following. 



: (a, 6, c) , v = (d, e, /) , and w = (g, h, i) . 



U • V X w 



(a 


, 6, c) ■ 


i J 

d e 

9 t 


k 

/ 

, i 








a 


e f 
h i 


-b 


d f 
9 i 


+ c 


d e 
g h 




The message is that to take the box product, you can simply take the determinant of the matrix which 
results by letting the rows be the rectangular components of the given vectors in the order in which 
they occur in the box product. More will be presented on determinants later. 



3.5 The Vector Identity Machine 

In practice, you often have to deal with combinations of several cross products mixed in with dot 
products. It is extremely useful to have a technique which will allow you to discover vector identities 
and simplify expressions involving cross and dot products in three dimensions. This involves two 
special symbols, Sij and e^h which are very useful in dealing with vector identities. To begin with, 
here is the definition of these symbols. 

Definition 3.5.1 The symbol 5^, called the Kronecker delta symbol is defined as follows. 

r = / 1 if i = J 
ij ~\ Oifz^j * 

With the Kronecker symbol i and j can equal any integer in {1, 2, • • • , n} for any n G N. 

Definition 3.5.2 For i,j, and k integers in the set, {1,2,3} , e^k is defined as follows. 

1 t/(z,j,fe) = (l,2,3),(2,3,l), or (3,1,2) 
e r]k -{ -lt/(z,j,fe) = (2,l,3),(l,3,2), or (3,2,1) . 
if there are any repeated integers 



The subscripts ijk and ij in the above are called indices, 
symbol Sijk is also called the permutation symbol. 



A single one is called an index. This 



The way to think of e^k is that £123 = 1 and if you switch any two of the numbers in the list 
z,j, fc, it changes the sign. Thus e^h — —£jik and e^k — —£kji etc. You should check that this 
rule reduces to the above definition. For example, it immediately implies that if there is a repeated 
index, the answer is zero. This follows because Suj — —Suj and so Shj = 0. 

It is useful to use the Einstein summation convention when dealing with these symbols. Simply 
stated, the convention is that you sum over the repeated index. Thus a^ means J^a^. Also, 
SijXj means ^ • SijXj = X{. Thus SijXj = Xi, 5a = 3. When you use this convention, there is one 
very important thing to never forget. It is this: Never have an index be repeated more than 
once. Thus afti is all right but a^bi is not. The reason for this is that you end up getting confused 
about what is meant. If you want to write ^2 i a^Ci it is best to simply use the summation notation. 
There is a very important reduction identity connecting these two symbols. 
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Lemma 3.5.3 The following holds. 

^ijk^irs = \VjrVks OkrOjs) • 

Proof: If {j, k} ^ {r, s} then every term in the sum on the left must have either Sij k or e irs 
contains a repeated index. Therefore, the left side equals zero. The right side also equals zero in 
this case. To see this, note that if the two sets are not equal, then there is one of the indices in one 
of the sets which is not in the other set. For example, it could be that j is not equal to either r or 
s. Then the right side equals zero. 

Therefore, it can be assumed {j,k} = {r, s} . If i = r and j = s for s ^ r, then there is exactly 
one term in the sum on the left and it equals 1. The right also reduces to 1 in this case. If i — s 
and j = r, there is exactly one term in the sum on the left which is nonzero and it must equal — 1. 
The right side also reduces to —1 in this case. If there is a repeated index in {j, k} , then every 
term in the sum on the left equals zero. The right also reduces to zero in this case because then 
j = k = r = s and so the right side becomes (1) (1) — ( — 1) (—1) = 0. ■ 

Proposition 3.5.4 Let u, v be vectors in W 1 where the Cartesian coordinates of w are {ui, • • • ,u n ) 
and the Cartesian coordinates of v are (vi, • • • ,v n ). Then u v = U{V{. If u, v are vectors in R 3 , 
then 

(U X V). =£ ijk UjV k . 

Also, 5 ik a k = a*. 

Proof: The first claim is obvious from the definition of the dot product. The second is verified 
by simply checking that it works. For example, 



U X V 



i J k 

U\ u 2 u 3 
Vi v 2 v 3 



and so 

(u X V) 1 = (U 2 V3 ~ U 3 V 2 ) • 

From the above formula in the proposition, 

ei jk UjV k = u 2 v 3 - u 3 v 2 , 

the same thing. The cases for (u x v) 2 and (u x v) 3 are verified similarly. The last claim follows 
directly from the definition. ■ 

With this notation, you can easily discover vector identities and simplify expressions which 
involve the cross product. 

Example 3.5.5 Discover a formula which simplifies (u x v) • (z x w) , u, v G M 3 . 

From the above description of the cross product and dot product, along with the reduction 
identity, 

(u x v) • (z x w) = 

£ijkUjV k £i rs Z r W s = yOj r O ks 0j s kr ) UjV k Z r W s 
= Uj V k Zj W k - Uj V k Z k Wj 

= (u • z) (v • w) — (u • w) (v • z) 
Example 3.5.6 Simplify ux (u x v) . 
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The i th component is 

t-ijk^j \U "^ ) k — ^-ijk^j^-krs^r^s — ^-kij^-krs^j^r^s 

— \0%rOjs OjrOis) UjUr^s 

= UjUiVj — UjUjVi 

= (U-V)lli- |u| Vi 



Hence 



ux (u x v) = (u • v) u — |u| V 



because the i ttl components of the two sides are equal for any i. 

3.6 Exercises 

1. Show that if a x u = for all unit vectors, u, then a = 0. 

2. Find the area of the triangle determined by the three points, (1, 2, 3) , (4, 2, 0) and (—3, 2, 1) . 

3. Find the area of the triangle determined by the three points, (1, 0, 3) , (4, 1, 0) and (—3, 1, 1) . 

4. Find the area of the triangle determined by the three points, (1,2,3) , (2,3,4) and (3,4,5) . 
Did something interesting happen here? What does it mean geometrically? 

5. Find the area of the parallelogram determined by the vectors, (1, 2, 3), (3, —2, 1) . 

6. Find the area of the parallelogram determined by the vectors, (1, 0, 3), (4, —2, 1) . 

7. Find the volume of the parallelepiped determined by the vectors, i— 7j— 5k, i— 2j— 6k,3i+2j+3k. 

8. Suppose a, b, and c are three vectors whose components are all integers. Can you conclude the 
volume of the parallelepiped determined from these three vectors will always be an integer? 

9. What does it mean geometrically if the box product of three vectors gives zero? 

10. Using Problem 9, find an equation of a plane containing the two position vectors, a and b 
and the point 0. Hint: If (x,y,z) is a point on this plane the volume of the parallelepiped 
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determined by (x, y, z) and the vectors a, b equals 0. 

11. Using the notion of the box product yielding either plus or minus the volume of the paral- 
lelepiped determined by the given three vectors, show that 

(a x b) -c = a- (b x c) 

In other words, the dot and the cross can be switched as long as the order of the vectors 
remains the same. Hint: There are two ways to do this, by the coordinate description of the 
dot and cross product and by geometric reasoning. It is better if you use geometric reasoning. 

12. Is ax (b x c) = (a x b) x c? What is the meaning of a x b x c? Explain. Hint: Try (i x j) xj. 

13. Discover a vector identity for (u x v) xw and one for ux (v x w). 

14. Discover a vector identity for (u x v) x (z x w). 

15. Simplify (u x v) • (v x w) x (w x z) . 

16. Simplify |u x v| 2 + (u • v) 2 - |u| 2 |v| 2 . 
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17. For u, v, w functions of t, show the product rules 

(u x v)' = u' X V + U X v' 
(u • v) = u' • V + u • v' 

18. If u is a function of £, and the magnitude |u(t)| is a constant, show from the above problem 
that the velocity u' is perpendicular to u. 

19. When you have a rotating rigid body with angular velocity vector f&, then the velocity vector 
v = u' is given by 

v = n x u 

where u is a position vector. The acceleration is the derivative of the velocity. Show that if Q 
is a constant vector, then the acceleration vector a = V is given by the formula 

a = fix (ft x u) . 

Now simplify the expression. It turns out this is centripetal acceleration. 
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20. Verify directly that the coordinate description of the cross product, a x b has the property 
that it is perpendicular to both a and b. Then show by direct computation that this coordinate 
description satisfies 

|axb| 2 = |a| 2 |b| 2 -(a.b) 2 

= |a| 2 |b| 2 (l-cos 2 (0)) 

where 9 is the angle included between the two vectors. Explain why |a x b| has the correct 
magnitude. All that is missing is the material about the right hand rule. Verify directly from 
the coordinate description of the cross product that the right thing happens with regards to 
the vectors i,j,k. Next verify that the distributive law holds for the coordinate description of 
the cross product. This gives another way to approach the cross product. First define it in 
terms of coordinates and then get the geometric properties from this. However, this approach 
does not yield the right hand rule property very easily. 
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Systems Of Equations 



4.1 Systems Of Equations, Geometry 

As you know, equations like 2x + 3y = 6 can be graphed as straight lines in R 2 . To find the solution 
to two such equations, you could graph the two straight lines and the ordered pairs identifying the 
point (or points) of intersection would give the x and y values of the solution to the two equations 
because such an ordered pair satisfies both equations. The following picture illustrates what can 
occur with two equations involving two variables. 




two parallel lines 
solutions 




infinitely 
many solutions 



In the first example of the above picture, there is a unique point of intersection. In the second, 
there are no points of intersection. The other thing which can occur is that the two lines are really 
the same line. For example, x + y — 1 and 2x + 2y = 2 are relations which when graphed yield 
the same line. In this case there are infinitely many points in the simultaneous solution of these 
two equations, every ordered pair which is on the graph of the line. It is always this way when 
considering linear systems of equations. There is either no solution, exactly one or infinitely many 
although the reasons for this are not completely comprehended by considering a simple picture in 
two dimensions, R 2 . 



Example 4.1.1 Find the solution to the system x + y = 3 ; y 



5. 



You can verify the solution is (x,y) = (—1,4) . You can see this geometrically by graphing the 
equations of the two lines. If you do so correctly, you should obtain a graph which looks something 
like the following in which the point of intersection represents the solution of the two equations. 

(z,2/) = (-l,4) 



Example 4.1.2 You can also imagine other situations such as the case of three intersecting lines 
having no common point of intersection or three intersecting lines which do intersect at a single 
point as illustrated in the following picture. 
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In the case of the first picture above, there would be no solution to the three equations whose 
graphs are the given lines. In the case of the second picture there is a solution to the three equations 
whose graphs are the given lines. 

The points, (x, y, z) satisfying an equation in three variables like 2x + Ay — 5z = 8 form a plane 
1 and geometrically, when you solve systems of equations involving three variables, you are taking 
intersections of planes. Consider the following picture involving two planes. 



1 Don't worry about why this is at this time. It is not important. The following discussion is intended to show you 
that geometric considerations like this don't take you anywhere. It is the algebraic procedures which are important 
and lead to important applications. 
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Notice how these two planes intersect in a line. It could also happen the two planes could fail to 
intersect. 

Now imagine a third plane. One thing that could happen is this third plane could have an 
intersection with one of the first planes which results in a line which fails to intersect the first line 
as illustrated in the following picture. 



New Plane 




Thus there is no point which lies in all three planes. The picture illustrates the situation in 
which the line of intersection of the new plane with one of the original planes forms a line parallel 
to the line of intersection of the first two planes. However, in three dimensions, it is possible for 
two lines to fail to intersect even though they are not parallel. Such lines are called skew lines. 
You might consider whether there exist two skew lines, each of which is the intersection of a pair of 
planes selected from a set of exactly three planes such that there is no point of intersection between 
the three planes. You can also see that if you tilt one of the planes you could obtain every pair of 
planes having a nonempty intersection in a line and yet there may be no point in the intersection of 
all three. 

It could happen also that the three planes could intersect in a single point as shown in the 
following picture. 



New Plane 




In this case, the three planes have a single point of intersection. The three planes could also 
intersect in a line. 
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Thus in the case of three equations having three variables, the planes determined by these 
equations could intersect in a single point, a line, or even fail to intersect at all. You see that 
in three dimensions there are many possibilities. If you want to waste some time, you can try to 
imagine all the things which could happen but this will not help for more variables than 3 which is 
where many of the important applications lie. 

Relations like x + y — 2z + 4w = 8 are often called hyper-planes. 2 However, it is impossible to 
draw pictures of such things. The only rational and useful way to deal with this subject is through 
the use of algebra not art. Mathematics exists partly to free us from having to always draw pictures 
in order to draw conclusions. 

4.2 Systems Of Equations, Algebraic Procedures 

4.2.1 Elementary Operations 

Consider the following example. 

Example 4.2.1 Find x and y such that 

x + y = 7 and 2x — y = 8. (4.1) 

The set of ordered pairs, (x, y) which solve both equations is called the solution set. 

You can verify that (x,y) = (5,2) is a solution to the above system. The interesting question is 
this: If you were not given this information to verify, how could you determine the solution? You 
can do this by using the following basic operations on the equations, none of which change the set 
of solutions of the system of equations. 

Definition 4.2.2 Elementary operations are those operations consisting of the following. 

1. Interchange the order in which the equations are listed. 

2. Multiply any equation by a nonzero number. 

3. Replace any equation with itself added to a multiple of another equation. 

Example 4.2.3 To illustrate the third of these operations on this particular system, consider the 
following. 

x + y = 7 
2x-y = S 



2 The evocative semi word, "hyper" conveys absolutely no meaning but is traditional usage which makes the 
terminology sound more impressive than something like long wide flat thing. Later we will discuss some terms which 
are not just evocative but yield real understanding. 
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The system has the same solution set as the system 

x + y = 7 
-3y = -6 * 

To obtain the second system, take the second equation of the first system and add -2 times the first 
equation to obtain 

-3y = -6. 

Now, this clearly shows that y = 2 and so it follows from the other equation that x + 2 = 7 and so 
x = 5. 

Of course a linear system may involve many equations and many variables. The solution set is 
still the collection of solutions to the equations. In every case, the above operations of Definition 
4.2.2 do not change the set of solutions to the system of linear equations. 

Theorem 4.2.4 Suppose you have two equations, involving the variables, 

\X\ , • • • , x n ) 

E 1 =f 1 ,E 2 = f2 (4.2) 

where Ei and E 2 are expressions involving the variables and f\ and f 2 are constants. (In the above 
example there are only two variables, x and y and E\ — x + y while E 2 —2x — y.) Then the system 
Ei = fi , E 2 = f 2 has the same solution set as 

Ei = f u E 2 + aE 1 = f 2 + af 1 . (4.3) 

Also the system E\ = f\,E 2 = f 2 has the same solutions as the system, E 2 = f 2l E\ = f\. The 
system E\ — /i, E 2 — f 2 has the same solution as the system E\ = fi,aE 2 = af 2 provided a ^ 0. 

Proof: If (#i, • • • , x n ) solves E\ = /i, E 2 = f 2 then it solves the first equation in E\ = /i, E 2 + 
aE\ = f 2 + af\. Also, it satisfies aE\ = af\ and so, since it also solves E 2 = f 2 it must solve 
E 2 -\-aEi = f 2 +afi. Therefore, if (xi, • • • , x n ) solves E\ = / 1? E 2 = f 2 it must also solve E 2 -\-aEi = 
/2+fl/i- On the other hand, if it solves the system E\ = f\ and E 2 -\-aEi = f 2 +afi, then aE\ = af\ 
and so you can subtract these equal quantities from both sides of E 2 + aE\ — f 2 + af\ to obtain 
E 2 — f 2 showing that it satisfies E\ — f\,E 2 — f 2 . 

The second assertion of the theorem which says that the system E\ = fi,E 2 = f 2 has the same 
solution as the system, E 2 = f 2l Ei = f\ is seen to be true because it involves nothing more than 
listing the two equations in a different order. They are the same equations. 

The third assertion of the theorem which says E\ = fi,E 2 = f 2 has the same solution as the 
system E\ = f\^aE 2 = af 2 provided a ^ is verified as follows: If (xi,--- , x n ) is a solution of 
E\ = fi 1 E 2 = f 2 , then it is a solution to E\ = f\ , aE 2 = af 2 because the second system only 
involves multiplying the equation, E 2 = f 2 by a. If (#i, • • • , x n ) is a solution of E\ = f\,aE 2 = af 2l 
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then upon multiplying aE 2 = a/2 by the number 1/a, you find that E 2 = $1. ■ 

Stated simply, the above theorem shows that the elementary operations do not change the solu- 
tion set of a system of equations. 

Here is an example in which there are three equations and three variables. You want to find 
values for x,y,z such that each of the given equations are satisfied when these values are plugged in 
to the equations. 



Example 4.2.5 Find the solutions to the system, 



x + 3y + 6z = 25 

2x + ly + 14z = 58 
2y + 5z = 19 



(4.4) 



To solve this system replace the second equation by (—2) times the first equation added to the 
second. This yields the system 

x + 3y + 6z = 25 

y + 2z = 8 (4.5) 

22/ + 5z = 19 
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Now take (—2) times the second and add to the third. More precisely, replace the third equation 
with (—2) times the second added to the third. This yields the system 

x + 3y + 6z = 25 

y + 2z = 8 (4.6) 

z = 3 

At this point, you can tell what the solution is. This system has the same solution as the original 
system and in the above, z = 3. Then using this in the second equation, it follows y + 6 = 8 and so 
y = 2. Now using this in the top equation yields x + 6 + 18 = 25 and so x = 1. This process is called 
back substitution. 

Alternatively, in 4.6 you could have continued as follows. Add (—2) times the bottom equation 
to the middle and then add (—6) times the bottom to the top. This yields 

x + 3y = 7 
y = 2 

z = 3 

Now add (—3) times the second to the top. This yields 

x = l 

z = 3 

a system which has the same solution set as the original system. This avoided back substitution and 
led to the same solution set. 

4.2.2 Gauss Elimination 

A less cumbersome way to represent a linear system is to write it as an augmented matrix. For 
example the linear system, 4.4 can be written as 




It has exactly the same information as the original system but here it is understood there is an 

/M / 3 \ ( Q \ 

x column, 2 , a y column, 7 and a z column, I 14 I . The rows correspond to the 

W W V 5 / 

equations in the system. Thus the top row in the augmented matrix corresponds to the equation, 

x + 3y + 6z = 25. 

Now when you replace an equation with a multiple of another equation added to itself, you are just 
taking a row of this augmented matrix and replacing it with a multiple of another row added to it. 
Thus the first step in solving 4.4 would be to take (—2) times the first row of the augmented matrix 
above and add it to the second row, 

1 3 6 | 25 
1 2 | 8 
2 5 | 19 

Note how this corresponds to 4.5. Next take (—2) times the second row and add to the third, 
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This augmented matrix corresponds to the system 

x + Sy + 6z = 25 

y + 2z = 8 
z = 3 

which is the same as 4.6. By back substitution you obtain the solution x — 1, y = 6, and z = 3. 
In general a linear system is of the form 

anxi H V a ln x n = &i 

: , (4.7) 

^ral^l ~r ' ' ' T~ 0^ ran X n ^m 

where the xi are variables and the a^ and b{ are constants. This system can be represented by the 
augmented matrix 

/ flu • • • flin I &1 \ 

: : | : . (4.8) 

\ Ami ' " ' ttmn \ ^m J 

Changes to the system of equations in 4.7 as a result of an elementary operations translate into 
changes of the augmented matrix resulting from a row operation. Note that Theorem 4.2.4 implies 
that the row operations deliver an augmented matrix for a system of equations which has the same 
solution set as the original system. 

Definition 4.2.6 The row operations consist of the following 

1. Switch two rows. 

2. Multiply a row by a nonzero number. 

3. Replace a row by a multiple of another row added to it. 

Gauss elimination is a systematic procedure to simplify an augmented matrix to a reduced 
form. In the following definition, the term "leading entry" refers to the first nonzero entry of a 
row when scanning the row from left to right. 

Definition 4.2.7 An augmented matrix is in echelon form if 

1. All nonzero rows are above any rows of zeros. 

2. Each leading entry of a row is in a column to the right of the leading entries of any rows above 
it. 

Definition 4.2.8 An augmented matrix is in row reduced echelon form if 

1. All nonzero rows are above any rows of zeros. 

2. Each leading entry of a row is in a column to the right of the leading entries of any rows above 
it. 

3. All entries in a column above and below a leading entry are zero. 
4- Each leading entry is a 1, the only nonzero entry in its column. 

Example 4.2.9 Here are some augmented matrices which are in row reduced echelon form. 
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/I 








5 


8 


°\ 








1 


2 


7 


o 
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Vo 














0/ 



/ 1 








\ 





1 














1 














1 


\o 








0/ 



Example 4.2.10 Here are augmented matrices in echelon form which are not in row reduced echelon 
form but which are in echelon form. 



/I 





6 


5 


8 


2 








2 


2 


7 


3 














o 


1 


\o 











o 


o 



/ 1 3 5 

2 

3 



V 



4\ 

7 

1 

0/ 



Example 4.2.11 Here are some augmented matrices which are not in echelon form. 



( 

1 2 3 

1 



V 



0\ 
3 
2 
1 

0/ 




/ 2 3 

1 5 

7 5 

\ 1 



3 \ 
2 

1 
0/ 



Definition 4.2.12 ^4 pivot position in a matrix is the location of a leading entry in an echelon 
form resulting from the application of row operations to the matrix. A pivot column is a column 
that contains a pivot position. 

For example consider the following. 

Example 4.2.13 Suppose 




Where are the pivot positions and pivot columns? 
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Replace the second row by —3 times the first added to the second. This yields 




This is not in reduced echelon form so replace the bottom row by —4 times the top row added to 
the bottom. This yields 

'12 3 | 4 
0—4—8 1 -6 

0—4—8| -6 

This is still not in reduced echelon form. Replace the bottom row by —1 times the middle row added 
to the bottom. This yields 

' 1 2 3 | 4 

0-4-8| -6 

| 

which is in echelon form, although not in reduced echelon form. Therefore, the pivot positions in 
the original matrix are the locations corresponding to the first row and first column and the second 
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row and second columns as shown in the following: 





2 



3 
1 
4 4 



Thus the pivot columns in the matrix are the first two columns. 

The following is the algorithm for obtaining a matrix which is in row reduced echelon form. 

Algorithm 4.2.14 

This algorithm tells how to start with a matrix and do row operations on it in such a way as to 
end up with a matrix in row reduced echelon form. 

1. Find the first nonzero column from the left. This is the first pivot column. The position at 
the top of the first pivot column is the first pivot position. Switch rows if necessary to place 
a nonzero number in the first pivot position. 

2. Use row operations to zero out the entries below the first pivot position. 

3. Ignore the row containing the most recent pivot position identified and the rows above it. 
Repeat steps 1 and 2 to the remaining sub-matrix, the rectangular array of numbers obtained 
from the original matrix by deleting the rows you just ignored. Repeat the process until there 
are no more rows to modify. The matrix will then be in echelon form. 

4. Moving from right to left, use the nonzero elements in the pivot positions to zero out the 
elements in the pivot columns which are above the pivots. 

5. Divide each nonzero row by the value of the leading entry. The result will be a matrix in row 
reduced echelon form. 

This row reduction procedure applies to both augmented matrices and non augmented matrices. 
There is nothing special about the augmented column with respect to the row reduction procedure. 

Example 4.2.15 Here is a matrix. 



f° 





2 


3 


2 \ 
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1 
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3 
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2 


2 

















Vo 








2 


1 J 



Do row reductions till you obtain a matrix in echelon form. Then complete the process by producing 
one in row reduced echelon form. 

The pivot column is the second. Hence the pivot position is the one in the first row and second 
column. Switch the first two rows to obtain a nonzero entry in this pivot position. 



f° 


1 


1 


4 


3 \ 








2 


3 


2 
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2 


2 

















Vo 








2 


1 / 
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Step two is not necessary because all the entries below the first pivot position in the resulting matrix 
are zero. Now ignore the top row and the columns to the left of this first pivot position. Thus you 
apply the same operations to the smaller matrix 

/ 2 3 2 \ 

1 2 2 



\0 2 I ) 

The next pivot column is the third corresponding to the first in this smaller matrix and the second 
pivot position is therefore, the one which is in the second row and third column. In this case it is 
not necessary to switch any rows to place a nonzero entry in this position because there is already a 
nonzero entry there. Multiply the third row of the original matrix by —2 and then add the second 
row to it. This yields 



/ 



V 






1 


1 


4 


3 








2 


3 


2 











-1 


t 


























2 


1 



\ 



/ 



The next matrix the steps in the algorithm are applied to is 



-1 

2 



The first pivot column is the first column in this case and no switching of rows is necessary because 
there is a nonzero entry in the first pivot position. Therefore, the algorithm yields for the next step 

\ 



3/ 



f ° 
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3 








2 


3 


2 











-1 


c 

















Vo 














Now the algorithm will be applied to the matrix 







There is only one column and it is nonzero so this single column is the pivot column. Therefore, the 
algorithm yields the following matrix for the echelon form. 



f ° 
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4 


3 \ 








2 


3 


2 











-1 


-2 














-3 


Vo 











o / 



To complete placing the matrix in reduced echelon form, multiply the third row by 3 and add —2 
times the fourth row to it. This yields 



(° 


1 


1 


4 


3 \ 








2 


3 


2 











-3 

















-3 


Vo 











o / 
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Next multiply the second row by 3 and take 2 times the fourth row and add to it. Then add the 
fourth row to the first. 



/ 



\ 



Next work on the fourth column in the same way. 

/ 3 3 

6 

0-3 



\ 



/ 



\ 





-3 

) 



Take — 1/2 times the second row and add to the first. 



/ 3 

6 





\ 



o \ 





-3 

J 



Finally, divide by the value of the leading entries in the nonzero rows. 

/ 1 \ 

10 

10 

1 

\ / 

The above algorithm is the way a computer would obtain a reduced echelon form for a given 
matrix. It is not necessary for you to pretend you are a computer but if you like to do so, the 
algorithm described above will work. The main idea is to do row operations in such a way as to end 
up with a matrix in echelon form or row reduced echelon form because when this has been done, the 
resulting augmented matrix will allow you to describe the solutions to the linear system of equations 
in a meaningful way. When you do row operations until you obtain row reduced echelon form, the 
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process is called the Gauss Jordan method. Otherwise, it is called Gauss elimination. 

Example 4.2.16 Give the complete solution to the system of equations, 5x + lOy — 7z = —2, 
2x + Ay — 3z — — 1, and 3x + 6y + 5z = 9. 



The augmented matrix for this system is 




Multiply the second row by 2, the first row by 5, and then take (—1) times the first row and add to 
the second. Then multiply the first row by 1/5. This yields 




i i 

I 9 
Now, combining some row operations, take (—3) times the first row and add this to 2 times the last 
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row and replace the last row with this. This yields. 





One more row operation, taking (—1) times the second row and adding to the bottom yields. 

I -1 
I 1 
| 20 

This is impossible because the last row indicates the need for a solution to the equation 

Ox + Oy + Oz = 20 

and there is no such thing because ^ 20. This shows there is no solution to the three given 
equations. When this happens, the system is called inconsistent. In this case it is very easy to 
describe the solution set. The system has no solution. 

Here is another example based on the use of row operations. 

Example 4.2.17 Give the complete solution to the system of equations, 3x — y — 5z = 9, y — IOz = 0, 
and —2x + y = —6. 

The augmented matrix of this system is 



3 


-1 


-5 I 


9 





1 


-10 | 





-2 


1 


o 1 


-6 



Replace the last row with 2 times the top row added to 3 times the bottom row. This gives 



3 


-1 


-5 


1 9 





1 


-10 


1 o 





1 


-10 


1 o 



The entry, 3 in this sequence of row operations is called the pivot. It is used to create zeros in the 
other places of the column. Next take —1 times the middle row and add to the bottom. Here the 1 
in the second row is the pivot. 

'3-1 -5 | 

1 -10 | 

o I 

Take the middle row and add to the top and then divide the top row which results by 3. 




This is in reduced echelon form. The equations corresponding to this reduced echelon form are 
y = 10 z and x = 3 + hz. Apparently z can equal any number. Lets call this number t. 3 Therefore, 
the solution set of this system is x = 3 + 5t, y = lOt, and z — t where t is completely arbitrary. The 
system has an infinite set of solutions which are given in the above simple way. This is what it is 
all about, finding the solutions to the system. 



3 In this context t is called a parameter. 
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There is some terminology connected to this which is useful. Recall how each column corresponds 
to a variable in the original system of equations. The variables corresponding to a pivot column are 
called basic variables. The other variables are called free variables. In Example 4.2.17 there 
was one free variable, z, and two basic variables, x and y. In describing the solution to the system 
of equations, the free variables are assigned a parameter. In Example 4.2.17 this parameter was t. 
Sometimes there are many free variables and in these cases, you need to use many parameters. Here 
is another example. 

Example 4.2.18 Find the solution to the system 



x + 2y- 


- z + w = 3 


x + y- 


- z + w = 1 


x + 3y - 


- z + w = 5 



The augmented matrix is 

1 2 —1 1 | 3 
1 1 —1 1 | 1 

1 3 -1 1 | 5 

Take —1 times the first row and add to the second. Then take —1 times the first row and add to 
the third. This yields 



/ 1 2 


-l 


1 


-1 








\o 1 








Now add the second row to the bottom row 






/I 2 


-1 


1 


-1 








\ o o 









(4.9) 







This matrix is in echelon form and you see the basic variables are x and y while the free variables 
are z and w. Assign s to z and t to w. Then the second row yields the equation, y = 2 while the 
top equation yields the equation, x + 2y — s -\-t = 3 and so since y = 2, this gives x + 4 — s + t = 3 
showing that x — —l J rS — t,y = 2 1 z = s 1 and w = t. It is customary to write this in the form 



/x\ 



x 

y 

z 
w 



\w J 



/-I 



V 



t\ 



J 



(4.10) 



This is another example of a system which has an infinite solution set but this time the solution 
set depends on two parameters, not one. Most people find it less confusing in the case of an infinite 
solution set to first place the augmented matrix in row reduced echelon form rather than just echelon 
form before seeking to write down the description of the solution. In the above, this means we don't 
stop with the echelon form 4.9. Instead we first place it in reduced echelon form as follows. 

1 



1 


-1 


1 1 




1 





o 1 


2 








o 1 






Then the solution is y = 2 from the second row and x = —1 + z — w from the first. Thus letting 
z = s and w — t, the solution is given in 4.10. 

The number of free variables is always equal to the number of different parameters used to 
describe the solution. If there are no free variables, then either there is no solution as in the case 
where row operations yield an echelon form like 
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or there is a unique solution as in the case where row operations yield an echelon form like 




Also, sometimes there are free variables and no solution as in the following: 




There are a lot of cases to consider but it is not necessary to make a major production of this. 
Do row operations till you obtain a matrix in echelon form or reduced echelon form and determine 
whether there is a solution. If there is, see if there are free variables. In this case, there will be 
infinitely many solutions. Find them by assigning different parameters to the free variables and 
obtain the solution. If there are no free variables, then there will be a unique solution which is easily 
determined once the augmented matrix is in echelon or row reduced echelon form. In every case, 
the process yields a straightforward way to describe the solutions to the linear system. As indicated 
above, you are probably less likely to become confused if you place the augmented matrix in row 
reduced echelon form rather than just echelon form. 
In summary, 

Definition 4.2.19 A system of linear equations is a list of equations, 



auxi + CL12X2 + • ■ 


• • + ainXn = h 


CL21X1 + a 2 2^2 + • 


' • + CL2nX n = h 


a>mlXl + CLm2X2 + ' ' 


\ Oj rnn X n — O'j 



where a^ are numbers, and bj is a number. The above is a system of m equations in the n variables, 
x\,X2 — • ,x n . Nothing is said about the relative size of m and n. Written more simply in terms of 
summation notation, the above can be written in the form 

n 
/ J &ijXj = Ji-, 2 = l,Z,0,---,?7l 

It is desired to find {x\, • • • , x n ) solving each of the equations listed. 
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As illustrated above, such a system of linear equations may have a unique solution, no solution, 
or infinitely many solutions and these are the only three cases which can occur for any linear system. 
Furthermore, you do exactly the same things to solve any linear system. You write the augmented 
matrix and do row operations until you get a simpler system in which it is possible to see the solution, 
usually obtaining a matrix in echelon or reduced echelon form. All is based on the observation that 
the row operations do not change the solution set. You can have more equations than variables, 
fewer equations than variables, etc. It doesn't matter. You always set up the augmented matrix and 
go to work on it. 



Definition 4.2.20 A system of linear equations is called consistent if there exists a solution, 
is called inconsistent if there is no solution. 



It 



These are reasonable words to describe the situations of having or not having a solution. If you 
think of each equation as a condition which must be satisfied by the variables, consistent would 
mean there is some choice of variables which can satisfy all the conditions. Inconsistent would mean 
there is no choice of the variables which can satisfy each of the conditions. 
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4.3 Exercises 

1. Find the point (#1, y\) which lies on both lines, x + Sy = 1 and Ax — y = 3. 

2. Solve Problem 1 graphically. That is, graph each line and see where they intersect. 

3. Find the point of intersection of the two lines 3x + y — 3 and x + 2y = 1. 

4. Solve Problem 3 graphically. That is, graph each line and see where they intersect. 

5. Do the three lines, x + 2y = l,2x — y = 1, and 4x + 3y = 3 have a common point of intersection? 
If so, find the point and if not, tell why they don't have such a common point of intersection. 

6. Do the three planes, x + y — 3z = 2, 2x + y + z = 1, and 3x + 2y — 2z = have a common 
point of intersection? If so, find one and if not, tell why there is no such point. 

7. You have a system of k equations in two variables, k > 2. Explain the geometric significance 
of 

(a) No solution. 

(b) A unique solution. 

(c) An infinite number of solutions. 

8. Here is an augmented matrix in which * denotes an arbitrary number and ■ denotes a nonzero 
number. Determine whether the given augmented matrix is consistent. If consistent, is the 
solution unique? 

/■* * * * | * \ 

■ * * | * 

■ * * | * 

\0 00B|*/ 

9. Here is an augmented matrix in which * denotes an arbitrary number and ■ denotes a nonzero 
number. Determine whether the given augmented matrix is consistent. If consistent, is the 
solution unique? 

■ * * | * 
■ * | * 
■ | * 

10. Here is an augmented matrix in which * denotes an arbitrary number and ■ denotes a nonzero 
number. Determine whether the given augmented matrix is consistent. If consistent, is the 
solution unique? 

/■*** * | * \ 

■ * | * 

■ * I * 
\0 000B|*/ 

11. Here is an augmented matrix in which * denotes an arbitrary number and ■ denotes a nonzero 
number. Determine whether the given augmented matrix is consistent. If consistent, is the 
solution unique? 



/■ 


* * * * 


* \ 





■ * * 


* 





■ 


o 


\o 


* 


■ / 



12. Suppose a system of equations has fewer equations than variables. Must such a system be 
consistent? If so, explain why and if not, give an example which is not consistent. 
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13. If a system of equations has more equations than variables, can it have a solution? If so, give 
an example and if not, tell why not. 

14. Find h such that 

' 2 h | 4 

3 6 | 7 

is the augmented matrix of an inconsistent matrix. 

15. Find h such that 

' 1 h | 3 

2 4 | 6 

is the augmented matrix of a consistent matrix. 

16. Find h such that 

' 1 1 | 4 

3 ft | 12 

is the augmented matrix of a consistent matrix. 

17. Choose h and k such that the augmented matrix shown has one solution. Then choose h and 
k such that the system has no solutions. Finally, choose h and k such that the system has 
infinitely many solutions. 

' 1 h | 2 

2 4 \ k 

18. Choose h and k such that the augmented matrix shown has one solution. Then choose h and 
k such that the system has no solutions. Finally, choose h and k such that the system has 
infinitely many solutions. 

' 1 2 | 2 
2 h | k 

19. Determine if the system is consistent. If so, is the solution unique? 

x + 2y + z — w = 2 

x — y + z + w = 1 

2x + y — z = 1 

4x + 2y + z = 5 

20. Determine if the system is consistent. If so, is the solution unique? 

x + 2y + z — w = 2 

x — y + z + w = 

2x + y — z = 1 

Ax + 2?/ + 2 = 3 

21. Find the general solution of the system whose augmented matrix is 




22. Find the general solution of the system whose augmented matrix is 
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23. Find the general solution of the system whose augmented matrix is 



1 1 
1 4 



24. Find the general solution of the system whose augmented matrix is 

/ 1 2 1 1 | 2 \ 

1 1 2 | 1 

1 2 1 | 3 
\ 1 1 2 | 2 j 

25. Find the general solution of the system whose augmented matrix is 

( 1 2 1 1 | 2 \ 

1 1 2 | 1 

2 1 | 3 

\ 1 -1 2 2 2 | j 

26. Give the complete solution to the system of equations, 7x + 14y + 15z = 22, 2x + Ay + 3z = 5, 
and 3x + 6y + 10 z = 13. 

27. Give the complete solution to the system of equations, 3x — y + Az = 6, y + 8z = 0, and 
-2a: + 2/ = -4. 

28. Give the complete solution to the system of equations, 9x — 2y+4z = —17, 13x — 3?/+6z = —25, 
and — 2x — z = 3. 
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29. Give the complete solution to the system of equations, 65x+84y+Wz = 546, 81x-\-105y-\-20z = 
682, and 84x + llOy + 21 z = 713. 

30. Give the complete solution to the system of equations, 8x -\-2y -\-3z = —3, 8x + 3y + 3z = —1, 
and 4r + ?/ + 3z = —9. 

31. Give the complete solution to the system of equations, — 8x + 2y + 5z = 18, — 8x + 3y + 5z = 13, 
and — Ax + y + 5z = 19. 

32. Give the complete solution to the system of equations, 3x — y — 2z = 3, y — Az = 0, and 
-2x + y= -2. 

33. Give the complete solution to the system of equations, — 9x + 15y = 66, —11a; + lSy = 79 
,— x + ?/ = 4, and z = 3. 

34. Give the complete solution to the system of equations, — 19x + 8?/ = —108, — 71x-\-30y = —404, 
-2x + y = -12, 4x + z = 14. 
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35. Consider the system — 5x + 2y — z — and — 5x — 2y — z = 0. Both equations equal zero and so 
— 5x + 2y — z = — 5x — 2y — z which is equivalent to y = 0. Thus x and z can equal anything. 
But when x = 1, z = —4, and y = are plugged in to the equations, it doesn't work. Why? 

36. Four times the weight of Gaston is 150 pounds more than the weight of Ichabod. Four times 
the weight of Ichabod is 660 pounds less than seventeen times the weight of Gaston. Four 
times the weight of Gaston plus the weight of Siegfried equals 290 pounds. Brunhilde would 
balance all three of the others. Find the weights of the four sisters. 
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37. The steady state temperature, u in a plate solves Laplace's equation, Au = 0. One way to 
approximate the solution which is often used is to divide the plate into a square mesh and 
require the temperature at each node to equal the average of the temperature at the four 
adjacent nodes. This procedure is justified by the mean value property of harmonic functions. 
In the following picture, the numbers represent the observed temperature at the indicated 
nodes. Your task is to find the temperature at the interior nodes, indicated by x,y,z, and w. 
One of the equations is z = \ (10 + + w + x). 



f 30 f 30 



2d_ 



2d. 



10 



w 



-*0 



-*0 



10 
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5.1 Matrix Arithmetic 

5.1.1 Addition And Scalar Multiplication Of Matrices 

You have now solved systems of equations by writing them in terms of an augmented matrix and then 
doing row operations on this augmented matrix. It turns out such rectangular arrays of numbers 
are important from many other different points of view. Numbers are also called scalars. In these 
notes numbers will always be either real or complex numbers. I will refer to the set of numbers as 
F sometimes when it is not important to worry about whether the number is real or complex. Thus 
F can be either the real numbers, R or the complex numbers, C. 

A matrix is a rectangular array of numbers. Several of them are referred to as matrices. For 
example, here is a matrix. 

12 3 4 

5 2 8 7 

6-912 

The size or dimension of a matrix is defined asmxn where m is the number of rows and n is 
the number of columns. The above matrix is a 3 x 4 matrix because there are three rows and four 
columns. The first row is (12 3 4), the second row is (528 7) and so forth. The first column is 

5 I . When specifying the size of a matrix, you always list the number of rows before the number 

6/ 

of columns. Also, you can remember the columns are like columns in a Greek temple. They stand 
upright while the rows just lay there like rows made by a tractor in a plowed field. Elements of the 
matrix are identified according to position in the matrix. For example, 8 is in position 2, 3 because 
it is in the second row and the third column. You might remember that you always list the rows 
before the columns by using the phrase Rowman Catholic. The symbol, (a^) refers to a matrix. 
The entry in the i th row and the j th column of this matrix is denoted by a^. Using this notation 
on the above matrix, <223 = 8, as2 = —9, Q>i2 — 2, etc. 

There are various operations which are done on matrices. Matrices can be added multiplied by 
a scalar, and multiplied by other matrices. To illustrate scalar multiplication, consider the following 
example in which a matrix is being multiplied by the scalar 3. 



1 


2 3 4 \ 


/ 3 


6 


9 


12 


5 


2 8 7 = 


= 15 


6 


24 


21 


6 


-912/ 


V 18 


-27 


3 


6 



The new matrix is obtained by multiplying every entry of the original matrix by the given scalar. If 
A is an m x n matrix, —A is defined to equal (—1) A. 

Two matrices must be the same size to be added. The sum of two matrices is a matrix which is 
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obtained by adding the corresponding entries. Thus 

-1 4 





Two matrices are equal exactly when they are the same size and the corresponding entries are 
identical. Thus 

'1 ij-(s s 

because they are different sizes. As noted above, you write {cij) f° r the matrix C whose ij th entry is 
Cij . In doing arithmetic with matrices you must define what happens in terms of the c^ sometimes 
called the entries of the matrix or the components of the matrix. 

The above discussion stated for general matrices is given in the following definition. 

Definition 5.1.1 (Scalar Multiplication) If A = (a^) and k is a scalar, then kA = (fca^-) . 



Example 5.1.2 7 







14 

7 





-28 




i / * Hill 

^ r \j. . i i I I I I 
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Definition 5.1.3 (Addition) If A = (a^) and B = (bij) are two m x n matrices. Then A + B = C 
where 

C = (c tJ ) 

for Cij = a,ij + b^ . 

Example 5.1.4 

' 1 2 3 \ / 5 2 3 \ _ / 6 4 6 
1 4 J ^ \ -6 2 1 )~ \ -5 2 5 

To save on notation, we will often use A^ to refer to the ij th entry of the matrix A. 

Definition 5.1.5 (The zero matrix) The mx n zero matrix is the m x n matrix having every entry 
equal to zero. It is denoted by 0. 

Example 5.1.6 The 2x3 zero matrix is f n 

Note there are 2x3 zero matrices, 3x4 zero matrices, etc. In fact there is a zero matrix for 
every size. 

Definition 5.1.7 (Equality of matrices) Let A and B be two matrices. Then A = B means that 
the two matrices are of the same size and for A = (a^) and B = (bij) , a^- = b^ for all 1 < i < m 
and 1 < j < n. 

The following properties of matrices can be easily verified. You should do so. 

• Commutative Law Of Addition. 

A + B = B + A, (5.1) 

• Associative Law for Addition. 

(A + 5) + C = A+(5 + C), (5.2) 

• Existence of an Additive Identity 

A + = A, (5.3) 

• Existence of an Additive Inverse 

A+(-A) = 0, (5.4) 

Also for a, /3 scalars, the following additional properties hold. 

• Distributive law over Matrix Addition. 

a(A + B) =aA + aB, (5.5) 

• Distributive law over Scalar Addition 

(a + f3)A = aA + (3A, (5.6) 

• Associative law for Scalar Multiplication 

a ((3 A) = a/3 (A) , (5.7) 

• Rule for Multiplication by 1. 

1A = A. (5.8) 

As an example, consider the Commutative Law of Addition. Let A + B = C and B + A = D. 
Why is D = CI 

y^ij — -^-ij * -*-*ij — ij ~i~ ij — ij ' 

Therefore, C = D because the ij th entries are the same. Note that the conclusion follows from the 
commutative law of addition of numbers. 
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5.1.2 Multiplication Of Matrices 

Definition 5.1.8 Matrices which are n x 1 or 1 x n are called vectors and are often denoted by a 
bold letter. Thus the nxl matrix 

xi \ 

is also called a column vector. The 1 x n matrix 

(xi-'-Xn) 

is called a row vector. 

Although the following description of matrix multiplication may seem strange, it is in fact the 
most important and useful of the matrix operations. To begin with consider the case where a matrix 
is multiplied by a column vector. First consider a special case. 





One way to remember this is as follows. Slide the vector, placing it on top the two rows as shown 
and then do the indicated operation. 

7x1 + 8x2 + 9x3 \ _ f 50 

h _> ! 7x4+8x5+9x6 J - ^ 122 

multiply the numbers on the top by the numbers on the bottom and add them up to get a single 
number for each row of the matrix as shown above. 
In more general terms, 

an ai2 a is \ I J \ = f CLnXi + a X2 x 2 + a 13 x 3 

Q>21 «22 «23 J I J J V a2lXl + a22X2 + a 23 X 3 

Another way to think of this is 

( an A , ( ai2 \ , ( «13 

X 1 + X 2 \ + ^3 

Thus you take x\ times the first column, add to x 2 times the second column, and finally x 3 times 
the third column. In general, here is the definition of how to multiply an (m x n) matrix times a 
(nxl) matrix. 

Definition 5.1.9 Let A = A^ be an m x n matrix and let v be an n x 1 matrix, 

vi 



Then Av is an m x 1 matrix and the i h component of this matrix is 

n 

(Av) i = Anvi + A i2 v 2 H h A in v n = ^ AijVj. 
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Thus 



Aw 



In other words, if 

where the a& are the columns, 




Av = ^2v k a. k 



k=l 



This follows from 5.9 and the observation that the y h column of A is 

( A ^ \ 

A 2j 



so 5.9 reduces to 



vi 



\ A m j ) 



( An \ 

A21 


+ V 2 


( M2 \ 

A 2 2 


V -4ml ) 




\ Am2 ) 



( A ln \ 

A 2n 



\ ^-mn / 



(5.9) 
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Note also that multiplication by an m x n matrix takes an n x 1 matrix, and produces an m x 1 
matrix. 

Here is another example. 

Example 5.1.10 Compute 

2 


V i / 

First of all this is of the form (3 x 4) (4 x 1) and so the result should be a (3 x 1) . Note how the 
inside numbers cancel. To get the element in the second row and first and only column, compute 




k=i 



a 2 \Vi + a 2 2V2 + «23^3 + «24^4 



= 0x1 + 2x2 + 1x0 + (-2) x 1 = 2. 
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You should do the rest of the problem and verify 




The next task is to multiply an m x n matrix times annxp matrix. Before doing so, the following 
may be helpful. 

For A and B matrices, in order to form the product, AB the number of columns of A must equal 
the number of rows of B. 

these must match! 

(rax n) (n x p ) = ra x p 
Note the two outside numbers give the size of the product. Remember: 



If the two middle numbers don't match, you can't multiply the matrices! 



Definition 5.1.11 When the number of columns of A equals the number of rows of B the two 
matrices are said to be conformable and the product, AB is obtained as follows. Let A be an mxn 
matrix and let B be an n x p matrix. Then B is of the form 

B = (bi,-.-,b p ) 

where bk is an n x 1 matrix or column vector. Then the m x p matrix AB is defined as follows: 

AB = (Abi,--- ,Ab p ) (5.10) 

where Abj. is an m x 1 matrix or column vector which gives the h th column of AB. 
Example 5.1.12 Multiply the following. 




The first thing you need to check before doing anything else is whether it is possible to do the 
multiplication. The first matrix is a 2 x 3 and the second matrix is a 3 x 3. Therefore, is it possible 
to multiply these matrices. According to the above discussion it should be a 2 x 3 matrix of the 
form 

/ First column Second column Third column \ 
.a. .a. .a. 




You know how to multiply a matrix times a vector and so you do so to obtain each of the three 
columns. Thus 

12 

3 1 
-2 1 1 




-1 9 
-2 7 



Example 5.1.13 Multiply the following. 
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First check if it is possible. This is of the form (3 x 3) (2 x 3) . The inside numbers do not 
match and so you can't do this multiplication. This means that anything you write will be absolute 
nonsense because it is impossible to multiply these matrices in this order. Aren't they the same 
two matrices considered in the previous example? Yes they are. It is just that here they are in a 
different order. This shows something you must always remember about matrix multiplication. 



Order Matters! 



Matrix Multiplication Is Not Commutative! 



This is very different than multiplication of numbers! 



5.1.3 The ij th Entry Of A Product 

It is important to describe matrix multiplication in terms of entries of the matrices. What is the 
ij th entry of AB1 It would be the i th entry of the j th column of AB. Thus it would be the i th entry 
of Ah,-. Now 



B 



ij 



B 



nj 



and from the above definition, the i entry is 



2 ^A ik B k j. 



fc=i 



In terms of pictures of the matrix, you are doing 



/ An 

A 2 i 



A 12 
A22 



A\ n \ / B\\ B 12 • • • B lp ^ 

A271 B21 B22 • • • B2 P 



I A xl A 



12 



A ln \ 



A21 A22 • • • A2n 



( B U \ 

B 2 j 



\ A m i A m 2 • • • A mn J y B n j J 
which is a m x 1 matrix or column vector which equals 



\ A m i A m 2 • • • A mn ) \ B n \ B n2 • • • B np ) 
Then as explained above, the j th column is of the form 



A 2 i 



\ A ml J 



B ij 



( A ^ \ 
A22 



\ A m2 J 



B 



2j 



( A ln \ 

A 2n 



\ -A-mn J 



B n j. 



The second entry of this m x 1 matrix is 

A21B1J + A22B2J 



A2 n B n j = 2_^ A2kBkj. 



fc=l 



Download free eBooks at bookboon.com 



(5.11) 



104 



Elementary Linear Algebra 



Matrices 



Similarly, the i entry of this mx 1 matrix is 



AnBij + A i2 B< 



2j 



^*-in-L>nj 2_j ik '-^kj' 



k=l 



This shows the following definition for matrix multiplication in terms of the ij th entries of the 
product coincides with Definition 5.1.11. 

Definition 5.1.14 Let A = (Aij) be an m x n matrix and let B = (Bij) be an n x p matrix. Then 
AB is an m x p matrix and 



{AB) ij = Y J A tk B, 



kj- 



(5.12) 



fc = l 



Another way to write this is 



( B ^ \ 



(AB) ti = ( A iX A i2 



Ain J 



B> 



2j 



V B nj j 



th 



Note that to get (AB)^ you involve the i th row of A and the j th column of B. Specifically, the ij 
entry of AB is the dot product of the i th row of A with the j th column of B. This is what the formula 
in 5.12 says. (Note that here the dot product does not involve taking conjugates.) 



1 2 
Example 5.1.15 Multiply if possible ( 3 1 

2 6 



2 3 1 
7 6 2 



First check to see if this is possible. It is of the form (3 x 2) (2 x 3) and since the inside numbers 
match, the two matrices are conformable and it is possible to do the multiplication. The result 
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should be a 3 x 3 matrix. The answer is of the form 




where the commas separate the columns in the resulting product. Thus the above product equals 

" 16 15 5 
13 15 5 
46 42 14 

a 3 x 3 matrix as desired. In terms of the ij th entries and the above definition, the entry in the third 
row and second column of the product should equal 



/ ^3fcfofc2 



^31^12 + &32&22 



= 2x3 + 6x6 = 42. 

You should try a few more such examples to verify the above definition in terms of the ij th entries 
works for other entries. 
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1 2 
Example 5.1.16 Multiply if possible [ 3 1 

2 6 

This is not possible because it is of the form (3 x 2) (3 x 3) and the middle numbers don't match. 
In other words the two matrices are not conformable in the indicated order. 

2 3 1 
Example 5.1.17 Multiply if possible [762 



This is possible because in this case it is of the form (3 x 3) (3 x 2) and the middle numbers do 
match so the matrices are conformable. When the multiplication is done it equals 





Check this and be sure you come up with the same answer. 

Example 5.1.18 Multiply if possible 2 ( 1 2 1 ). 

In this case you are trying to do (3 x 1) (1 x 4) . The inside numbers match so you can do it. 
Verify 

1 \ / 1 2 1 

2(12 10)=2420 
1 / \ 1 2 1 

5.1.4 Properties Of Matrix Multiplication 

As pointed out above, sometimes it is possible to multiply matrices in one order but not in the other 
order. What if it makes sense to multiply them in either order? Will the two products be equal 
then? 

Example 5.1.19 Compare ( Q A ) ( 1 n ) and 
The first product is 



3 4 / \ 1 / 1 / V 3 4 



The second product is 



1 2W0 1 \ _ / 2 1 

3 4 J \ 1 J ~ \ 4 3 

1 \ f 1 2 \ _ / 3 4 

1 j l 3 4 j " V 1 2 



You see these are not equal. Again you cannot conclude that AB = BA for matrix multiplication 
even when multiplication is defined in both orders. However, there are some properties which do 
hold. 

Proposition 5.1.20 If all multiplications and additions make sense, the following hold for matrices, 
A, B, C and a, b scalars. 

A (aB + bC) = a (AB) + b (AC) (5.13) 

(B + C)A = BA + CA (5.14) 

A(BC) = (AB)C (5.15) 



Download free eBooks at bookboon.com 

107 



Elementary Linear Algebra Matrices 

Proof: Using Definition 5.1.14, 

{A{aB + bC)) %J = J2 A ^(aB + bC) kj 

k 

= ^A ik (aB kj +bC kj ) 
k 

= a^ A ikB k j + b ^ A ikCkj 
k k 

= aiAB^ + biAC)^ 
= (a(AB) + b(AC)) ij . 

Thus A (B + C) = AB + AC as claimed. Formula 5.14 is entirely similar. 

Formula 5.15 is the associative law of multiplication. Using Definition 5.1.14, 

(A(BC)) i3 = ^Ai k (BC) kj 

k 

= / J Mk / ^BklCij 

k I 

I 

= ((AB)0... 

This proves 5.15. ■ 

5.1.5 The Transpose 

Another important operation on matrices is that of taking the transpose. The following example 
shows what is meant by this operation, denoted by placing a T as an exponent on the matrix. 

1 3 2 
4 1 6 

What happened? The first column became the first row and the second column became the second 
row. Thus the 3x2 matrix became a 2 x 3 matrix. The number 3 was in the second row and the 
first column and it ended up in the first row and second column. Here is the definition. 

Definition 5.1.21 Let A be an m x n matrix. Then A T denotes the nx m matrix which is defined 
as follows. 

Example 5.1.22 

1 2 -6 x 
3 5 4 

The transpose of a matrix has the following important properties. 

Lemma 5.1.23 Let A be an m x n matrix and let B be a n x p matrix. Then 

(AB) T = B T A T (5.16) 

and if a and /3 are scalars, 

(aA + pB) T = aA T + pB T (5.17) 
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Proof: From the definition, 

((ABf)^ = (A B)ji 

== / j Ajk-E>ki 
k 

k 

= (B T A T ).. 
The proof of Formula 5.17 is left as an exercise. ■ 

Definition 5.1.24 An n x n matrix A is said to be symmetric if A = A T . It is said to be skew 
symmetric if A — —A T . 

Example 5.1.25 Let 



Then A is symmetric. 
Example 5.1.26 Let 

A 

Then A is skew symmetric. 
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5.1.6 The Identity And Inverses 

There is a special matrix called / and referred to as the identity matrix. It is always a square matrix, 
meaning the number of rows equals the number of columns and it has the property that there are 
ones down the main diagonal and zeroes elsewhere. Here are some identity matrices of various sizes. 

/ 1 \ 

10 

10 

\ 1 J 

The first is the lxl identity matrix, the second is the 2x2 identity matrix, the third is the 3x3 
identity matrix, and the fourth is the 4x4 identity matrix. By extension, you can likely see what 
the n x n identity matrix would be. It is so important that there is a special symbol to denote the 
ij th entry of the identity matrix 




J-ij 



Sij 
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where Sij is the Kronecker symbol defined by 



r in i=j 

It is called the identity matrix because it is a multiplicative identity in the following sense. 

Lemma 5.1.27 Suppose A is an mx n matrix and I n is the nx n identity matrix. Then AI n = A. 
If I m is the m x m identity matrix, it also follows that I m A = A. 

Proof: 

k 

= Aij 

and so AI n = A. The other case is left as an exercise for you. ■ 

Definition 5.1.28 An nx n matrix A has an inverse, A~ x if and only if AA~ X = A~ x A = I. 
Such a matrix is called invertible. 

It is very important to observe that the inverse of a matrix, if it exists, is unique. Another way 
to think of this is that if it acts like the inverse, then it is the inverse. 

Theorem 5.1.29 Suppose A' 1 exists and AB = BA = I. Then B = A~ x . 

Proof: 

A' 1 = A' 1 1 = A' 1 (AB) = (A' 1 A) B = IB = B.M 

Unlike ordinary multiplication of numbers, it can happen that i^O but A may fail to have an 
inverse. This is illustrated in the following example. 

Example 5.1.30 Let A = ( J . Does A have an inverse? 

One might think A would have an inverse because it does not equal zero. However, 

and if A 1 existed, this could not happen because you could write 

0\ -i//0\\_,_ 1 / i /-l 



o/- 1 ({o))= A -\\ 1 

a contradiction. Thus the answer is that A does not have an inverse. 



Example 5.1.31 Let A = [ J . Show I ) is the inverse of A. 



To check this, multiply 



and 



iiX-ii'Mi! 

.! i)(:;hi: 



showing that this matrix is indeed the inverse of A. 
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5.1.7 Finding The Inverse Of A Matrix 

In the last example, how would you find A -1 ? You wish to find a matrix ( J such that 

1 1 \ / x z \ / 1 
1 2 J \y w J ~ { 1 

This requires the solution of the systems of equations, 

and 

z + w = 0, z + 2w = 1. 

Writing the augmented matrix for these two systems gives 

/ 1 i i i 

V 1 2 I 

for the first system and 

1 1 | 
12 11 



(5.18) 



(5.19) 



for the second. Lets solve the first system. Take (—1) times the first row and add to the second to 
get 

' 1 1 | 1 

1 | -1 

Now take ( — 1) times the second row and add to the first to get 

1 | 2 

1 | — 1 

Putting in the variables, this says x = 2 and y = — 1. 

Now solve the second system, 5.19 to find z and w. Take (—1) times the first row and add to the 
second to get 

1 1 | 

1 | 1 

Now take ( — 1) times the second row and add to the first to get 

1 | -1 
1 | 1 

Putting in the variables, this says z — —\ and w = 1. Therefore, the inverse is 

2 -1 
-1 1 

Didn't the above seem rather repetitive? Note that exactly the same row operations were used 
in both systems. In each case, the end result was something of the form (I|v) where I is the identity 

and v gave a column of the inverse. In the above, ( J , the first column of the inverse was 

V y J 

obtained first and then the second column ( 

y w 

To simplify this procedure, you could have written 



1 1 | 1 
12 10 1 



and row reduced till you obtained 
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1 | 2 -1 
1 | — 1 1 

and read off the inverse as the 2x2 matrix on the right side. 

This is the reason for the following simple procedure for finding the inverse of a matrix. This 
procedure is called the Gauss-Jordan procedure. 

Procedure 5.1.32 Suppose A is an n x n matrix. To find A -1 if it exists, form the augmented 
n x 2n matrix 

(A\I) 



and then, if possible do row operations until you obtain an n x 2n matrix of the form 

(I\B). 



(5.20) 



When this has been done, B = A 1 . If it is impossible to row reduce to a matrix of the form (I\B) , 
then A has no inverse. 

Actually, all this shows is how to find a right inverse if it exists. Later, I will show that this right 
inverse is the inverse. See Corollary 7.2.15 or Theorem 8.2.11 presented later. 

1 2 2 
Example 5.1.33 Let A = ( 1 2 ) . Find A' 1 if it exists. 

3 1 -1 



Set up the augmented matrix (A\I) 



12 2 | 1 
10 2 | 1 
3 1—110 1 



A^cHB TfJ - 






-Cf-- 



-OA - ***»^ 



/U«*/.«. -J Sk£-H0 



M/ /^Jco^^^f^^) 



1JLJ /3 



&**& 



<$OOflOO 



A**t) 







The D. E. Shaw group is hiring. 
You can do the math. 

Meet us on-campus this semester. 
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DEShaw&Co 
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Next take (—1) times the first row and add to the second followed by (—3) times the first row added 
to the last. This yields 

12 2 | 1 
0-2 1-110 
-5 -7 | -3 1 

Then take 5 times the second row and add to -2 times the last row. 





Next take the last row and add to (—7) times the top row. This yields 



-7 


-14 


o 1 


-6 5 


-2 





-10 


o 1 


-5 5 











14 1 


1 5 


-2 
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Now take (—7/5) times the second row and add to the top. 



-7 





-10 

14 







1 5 -2 
Finally divide the top row by -7, the second row by -10 and the bottom row by 14 which yields 



Therefore, the inverse is 



/ 1 

1 
1 



l 

2 

J_ 
14 



J_ 
14 



2 

7 

_ 1 
2 

_5_ 
14 



2 

7 

_ 1 
2 

5_ 
14 



i \ 

7 » 



\ 



Example 5.1.34 Let A = 

Write the augmented matrix (A\I) 




Find A 1 if it exists. 





and proceed to do row operations attempting to obtain (l\A x ) . Take (—1) times the top row and 
add to the second. Then take (—2) times the top row and add to the bottom. 





Next add (—1) times the second row to the bottom row. 

I i 
I -i 
I -i ■ 

At this point, you can see there will be no inverse because you have obtained a row of zeros in the 
left half of the augmented matrix (A\I) . Thus there will be no way to obtain / on the left. 





Example 5.1.35 Let A 






1 


-1 


1 


1 





Find A x if it exists. 



► ► 



Form the augmented matrix 



1 





1 


1 








1 


-1 


1 


o 


1 





1 


1 


-1 


o 





1 
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Now do row operations until the n x n matrix on the left becomes the identity matrix. This yields 
after some computations, 

/ 1 | \ 



\ \ 



1 
1 



and so the inverse of A is the matrix on the right, 



(° 

1 

I 1 



\ \ 







J 



Checking the answer is easy. Just multiply the matrices and see if it works. 






1 \ 


(° 


1 

2 


\ \ 


/ 1 


-1 


1 


1 


-1 





= 010 


1 


-1/ 


1 


1 
2 


1 

2 ) 


\ 1 



Always check your answer because if you are like some of us, you will usually have made a mistake. 

Example 5.1.36 In this example, it is shown how to use the inverse of a matrix to find the solution 
to a system of equations. Consider the following system of equations. Use the inverse of a suitable 
matrix to give the solutions to this system. 




The system of equations can be written in terms of matrices as 








(5.21) 



More simply, this is of the form Ax = b. Suppose you find the inverse of the matrix A 1 . Then you 
could multiply both sides of this equation by A -1 to obtain 

x = (A' 1 A) x = A' 1 (Ax) = A~ x h. 

This gives the solution as x = A~ x h. Note that once you have found the inverse, you can easily get 
the solution for different right hand sides without any effort. It is always just A~ x h. In the given 
example, the inverse of the matrix is 




This was shown in Example 5.1.35. Therefore, from what was just explained, the solution to the 
given system is 

' x\ ( Q k k \ f 1 
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What if the right side of 5.21 had been 

What would be the solution to 








By the above discussion, it is just 

This illustrates why once you have found the inverse of a given matrix, you can use it to solve many 
different systems easily. 

5.2 Exercises 

1. Here are some matrices: 

/l23\/3 -1 2 \ 
A ~ \ 2 1 7 )' B ~ \ -3 2 1 J' 

"- 0-*-U ?*)•*- (I 

Find if possible — 3A, 3B — A, AC, CB, AE, EA. If it is not possible explain why. 

2. Here are some matrices: 

3 I \,B=(\-?\ 



l "I 

5 0/' I 4 -3 ' 3 ' 



c - 1 : n.D-f-' '.Vb m 
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Find if possible -3A, W - A, AC, CA, AE, EA, BE, DE. If it is not possible explain why. 
3. Here are some matrices: 



1 2 
A = I 3 2 | ,B = 

1 2 



C 



5 



,D 



2-5 2 
-3 2 1 I ' 



4 -3 )' E= { 3 



Find if possible -3A T , 35 - A T , AC, CA, AE, E T B, BE, DE, EE T , E T E. If it is not possible 
explain why. 



4. Here are some matrices: 

A 



1 2 

3 2), B 

1 -1 



'- 'isw-^w; 



2-5 2 
-3 2 1 




W 



ffe 



It's only an 
opportunity if 

you act on it 




a-- 

:I! 

IL 
■-■ 

r._ 



IKEA.SE/STUDENT 



JWB-. 



«%«* 
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Find the following if possible and explain why it is not possible if this is the case. 

AD, DA, D T B, D T BE, E T D, DE T . 



1 



5. Let A = 


ble. 




(a) 


AB 


(b) 


BA 


(c) 


AC 


(d) 


CA 


(e) 


CB 


(f) 


BC 



B 



and C 



1 

-1 
-3 



Find if possi- 



6. Suppose A and B are square matrices of the same size. Which of the following are correct? 

(a) (A - B) 2 = A 2 - 2AB + B 2 

(b) (AB) 2 = A 2 B 2 

(c) (A + B) 2 = A 2 + 2AB + B 2 

(d) (A + B) 2 = A 2 +AB + BA + B 2 

(e) A 2 B 2 = A(AB)B 

(f) (A + Bf = A 3 + 3A 2 B + 3AB 2 + B 3 

(g) (A + B)(A-B) = A 2 -B 2 



7. Let A = 



. Find 



all 



2x2 matrices, B such that AB = 0. 



8. Let x = ( — 1, — 1, 1) and y = (0, 1, 2) . Find x y and xy if possible 



9. Let A 



1 2 
3 4 
what should k equal? 



,B 



10. Let A 



1 2 
3 4 
what should k equal? 



,B 



1 2 

3 fc 

1 2 

1 k 



Is it possible to choose k such that AB = BA? If so, 



Is it possible to choose k such that AB — BA? If so, 



11. In 5.1 - 5.8 describe -A and 0. 

12. Let A be an n x n matrix. Show A equals the sum of a symmetric and a skew symmetric 
matrix. (M is skew symmetric if M = —M T . M is symmetric if M T = M.) Hint: Show that 
\ (A T + A) is symmetric and then consider using this as one of the matrices. 

13. Show every skew symmetric matrix has all zeros down the main diagonal. The main diagonal 
consists of every entry of the matrix which is of the form an. It runs from the upper left down 
to the lower right. 
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14. Suppose M is a 3 x 3 skew symmetric matrix. Show there exists a vector Q such that for all 

ueR 3 

Mm = fl x u 

Hint: Explain why, since M is skew symmetric it is of the form 

(0 — OO3 U02 

uo 3 -cji 

-UJ 2 00 1 

where the uoi are numbers. Then consider uji'i + co>2J + c^k. 

15. Using only the properties 5.1 - 5.8 show —A is unique. 

16. Using only the properties 5.1 - 5.8 show is unique. 

17. Using only the properties 5.1 - 5.8 show 0A = 0. Here the on the left is the scalar and the 
on the right is the zero for m x n matrices. 

18. Using only the properties 5.1 - 5.8 and previous problems show ( — 1) A = —A. 

19. Prove 5.17. 

20. Prove that I m A = A where A is an m x n matrix. 

21. Give an example of matrices, A,B,C such that B ^ C, A/0, and yet AB = AC. 

22. Suppose AB = AC and A is an invertible n x n matrix. Does it follow that B = CI Explain 
why or why not. What if A were a non invertible n x n matrix? 

23. Find your own examples: 

(a) 2x2 matrices, A and B such that A/0,5/0 with AB ^ BA. 

(b) 2x2 matrices, A and B such that A ^ 0, B ^ 0, but AB = 0. 

(c) 2x2 matrices, A, D, and C such that A/0,C/D, but AC = AD. 

24. Explain why if AB = AC and A" 1 exists, then B = C. 

25. Give an example of a matrix A such that A 2 = I and yet A ^ I and A ^ —I. 

26. Give an example of matrices, A, 5 such that neither A nor 5 equals zero and yet AB = 0. 

27. Give another example other than the one given in this section of two square matrices, A and 
B such that AB ^ BA. 

28. Let 

2 1 

-1 3 
Find A -1 if possible. If A -1 does not exist, determine why. 

29. Let 

Find A -1 if possible. If A -1 does not exist, determine why. 

30. Let 

/1_ ' 3 



Find A x if possible. If A x does not exist, determine why. 
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31. Let 



2 1 
4 2 



Find A x if possible. If A x does not exist, determine why. 



32. Let A be a 2 x 2 matrix which has an inverse. Say A = 
terms of a, 6, c, <i. 

33. Let 



c d 




Find A x if possible. If A x does not exist, determine why. 
34. Let 




Find A x if possible. If A x does not exist, determine why. 
35. Let 




Find A x if possible. If A 1 does not exist, determine why. 
36. Let 



A = 



( 1 2 2 \ 
112 
2 1-32 

\ 1 2 1 2 / 

Find A -1 if possible. If A~ x does not exist, determine why. 



37. Write 



/ Xi - x 2 + 2x 3 \ 

2x 3 + Xl 

3x 3 

\ 3x 4 + 3x 2 + x\ ) 



in the form A 



X 2 

x 3 
\x 4 J 



Find a formula for A 1 in 



where A is an appropriate matrix. 
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38. Write 



39. Write 



( X\+ 3x 2 + 2x 3 \ 

6x 3 

y x 4 + 3x 2 + Xi J 

I Xi + x 2 + x 3 ^ 

2x 3 + xi + x 2 

x 3 -xi 

3X4 + Xi 



V 



in the form A 



in the form A 



J 



( Xl \ 

x 2 
x 3 

\x 4 J 

/ Xi \ 

x 2 

x 3 

\x A ) 



where A is an appropriate matrix. 



where A is an appropriate matrix. 



40. Using the inverse of the matrix, find the solution to the systems 






REDEFINE YOUR FUTURE 

AXA GLOBAL 
GRADUATE PROGRAM 



redefining/standards 
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Now give the solution in terms of a, 6, and c to 





41. Using the inverse of the matrix, find the solution to the systems 





Now give the solution in terms of a, 6, and c to 





42. Using the inverse of the matrix, find the solution to the system 



3 



1 

! 

2 

1 

2 -2 

Z 4 



1 
2 


l 

4 



_5 
2 
1 
9 



\ 


/ X \ 




(a\ 




y 




b 




z 




c 


/ 


\w J 




\d) 



43. Show that if A is an n x n invertible matrix and x is a n x 1 matrix such that Ax = b for b 
an n x 1 matrix, then x = A _1 b. 

44. Prove that if A -1 exists and Ax = then x = 0. 

45. Show that if A -1 exists for annxn matrix, then it is unique. That is, if B A = I and AB = J, 
then B = A" 1 . 

46. Show that if A is an invertible n x n matrix, then so is A T and (^4 T ) = (^. _1 ) • 

47. Show (AB)" 1 = B- X A- X by verifying that AB (B~ x A~ x ) = J and 

B^A' 1 (AB) = I. 
Hint: Use Problem 45. 

48. Show that (ABC)' 1 = C^B^A' 1 by verifying that 

(ABC) (C~ l B- l A- 1 ) = I 
and (C~ 1 B- 1 A- 1 ) (ABC) = I. Hint: Use Problem 45. 

49. If A is invertible, show (A 2 )~ = (A' 1 ) . Hint: Use Problem 45. 
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50. If A is invertible, show (A x ) = A. Hint: Use Problem 45. 

51. Let A and be a real m x n matrix and let x £ R n and y E M m . Show (Ax, y) Rm = (x,A T y) 
where (•, -) Rk denotes the dot product in R fe . In the notation above, Ax • y = x-A T y. Use the 
definition of matrix multiplication to do this. 

52. Use the result of Problem 51 to verify directly that (AB) = B T A T without making any 
reference to subscripts. 

53. A matrix A is called a projection if A 2 = A. Here is a matrix. 

2 2 

1 1 2 

-1 -1 

Show that this is a projection. Show that a vector in the column space of a projection matrix 
is left unchanged by multiplication by A. 
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6.1 Basic Techniques And Properties 

6.1.1 Cofactors And 2x2 Determinants 

Let A be an n x n matrix. The determinant of A, denoted as det (A) is a number. If the matrix 
is a 2x2 matrix, this number is very easy to find. 

Definition 6.1.1 Let A = ( a j . Then 

det (A) = ad — cb. 

The determinant is also often denoted by enclosing the matrix with two vertical lines. Thus 

a b 
c d 

/ o a \ 
Example 6.1.2 Find det 

From the definition this is just (2) (6) — ( — 1) (4) = 16. 

Having defined what is meant by the determinant of a 2 x 2 matrix, what about a 3 x 3 matrix? 

Definition 6.1.3 Suppose A is a 3 x 3 matrix. The ij th minor, denoted as mino^A)^ , is the 
determinant of the 2x2 matrix which results from deleting the i th row and the j th column. 

Example 6.1.4 Consider the matrix 





det ( 


a b 
c d 


) 


M 




1 


6 ' 






The (1,2) minor is the determinant of the 2x2 matrix which results when you delete the first row 
and the second column. This minor is therefore 

det (J \)=-2. 

The (2, 3) minor is the determinant of the 2x2 matrix which results when you delete the second row 
and the third column. This minor is therefore 

det [ J I | =-4. 
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Definition 6.1.5 Suppose A is a 3 x 3 matrix. The ij th co factor is defined to be (— l) l+J x 
(ij th minor) . In words, you multiply (—1)* J times the ij th minor to get the ij th cof actor. The 
cofactors of a matrix are so important that special notation is appropriate when referring to them. 
The ij th cof actor of a matrix A will be denoted by cof (A) • ■ . It is also convenient to refer to the 
cofactor of an entry of a matrix as follows. For a^ an entry of the matrix, its cofactor is just 
cof (A) • • . Thus the cofactor of the ij th entry is just the ij th cofactor. 

Example 6.1.6 Consider the matrix 



A = 



The (1,2) minor is the determinant of the 2x2 matrix which results when you delete the first row 
and the second column. This minor is therefore 




It follows 



det 



1+2, 



cof(A) 12 = (-l) i+z det 



-2. 



\l+2 



(-ir z (-2) 



The (2, 3) minor is the determinant of the 2x2 matrix which results when you delete the second row 
and the third column. This minor is therefore 



det 



Therefore, 



Similarly, 



cof(A) 23 = (-l) 2+3 det 



cof(A) 22 = (-l) 2+2 det 



-4. 



= (-l) 2+3 (-4)=4. 



1 3 
3 1 



Definition 6.1.7 The determinant of a 3 x 3 matrix A, is obtained by picking a row (column) and 
taking the product of each entry in that row (column) with its cofactor and adding these up. This 
process when applied to the i th row (column) is known as expanding the determinant along the i th 
row (column). 

Example 6.1.8 Find the determinant of 




Here is how it is done by "expanding along the first column" . 

cof(A) 1:L cof(A) 21 cof(A) 31 



1(-1) 



1+1 



3 2 
2 1 



+ 4(-l) 



2+1 



2 3 
2 1 



3(-l) 



3+1 



2 3 

3 2 



0. 



You see, we just followed the rule in the above definition. We took the 1 in the first column and 
multiplied it by its cofactor, the 4 in the first column and multiplied it by its cofactor, and the 3 in 
the first column and multiplied it by its cofactor. Then we added these numbers together. 
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You could also expand the determinant along the second row as follows. 

cof(A) 21 cof(A) 22 cof(A) 23 



4(-l) 



2+1 



2 3 
2 1 



3(-l) 



2+2 



1 3 
3 1 



2(-l) 



2+3 



1 2 
3 2 



0. 



Observe this gives the same number. You should try expanding along other rows and columns. If 
you don't make any mistakes, you will always get the same answer. 

What about a 4 x 4 matrix? You know now how to find the determinant of a 3 x 3 matrix. The 
pattern is the same. 

Definition 6.1.9 Suppose A is a 4 x 4 matrix. The ij th minor is the determinant of the 3x3 
matrix you obtain when you delete the i th row and the j th column. The ij th co factor, cof (A)^ is 

defined to be ( — 1) % 3 x (ij th minor) . In words, you multiply (—1)* 3 times the ij th minor to get the 
ijth co j ac i OTt 

Definition 6.1.10 The determinant of a 4 x 4 matrix A, is obtained by picking a row (column) and 
taking the product of each entry in that row (column) with its cof actor and adding these together. 
This process when applied to the i th row (column) is known as expanding the determinant along the 

nth 



jth rQW ( co l umn ) m 

Example 6.1.11 Find det (A) where 



5 
1 
\3 



4 \ 
3 

5 
2/ 



As in the case of a 3 x 3 matrix, you can expand this along any row or column. Lets pick the 
third column, det (A) = 



3(-l) 



1+3 



4(-l) 



3+3 



5 4 3 




1 2 4 


1 3 5 


+ 2(-l) 2+3 


1 3 5 


3 4 2 




3 4 2 


1 2 4 




1 2 4 


5 4 3 


+ 3(-l) 4+3 


5 4 3 


3 4 2 




1 3 5 
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Now you know how to expand each of these 3x3 matrices along a row or a column. If you do so, 
you will get —12 assuming you make no mistakes. You could expand this matrix along any row or 
any column and assuming you make no mistakes, you will always get the same thing which is defined 
to be the determinant of the matrix A. This method of evaluating a determinant by expanding along 
a row or a column is called the method of Laplace expansion. 

Note that each of the four terms above involves three terms consisting of determinants of 2 x 2 
matrices and each of these will need 2 terms. Therefore, there will be 4 x 3 x 2 = 24 terms to 
evaluate in order to find the determinant using the method of Laplace expansion. Suppose now you 
have a 10 x 10 matrix and you follow the above pattern for evaluating determinants. By analogy to 
the above, there will be 10! = 3, 628 , 800 terms involved in the evaluation of such a determinant by 
Laplace expansion along a row or column. This is a lot of terms. 

In addition to the difficulties just discussed, you should regard the above claim that you always 
get the same answer by picking any row or column with considerable skepticism. It is incredible and 
not at all obvious. However, it requires a little effort to establish it. This is done in the section on 
the theory of the determinant. 

Definition 6.1.12 Let A = (a^) be an n x n matrix and suppose the determinant of a (n — 1) x 
(n — 1) matrix has been defined. Then a new matrix called the co factor matrix, cof (A) is defined by 
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cof (A) = (cij) where to obtain Cij delete the i th row and the j th column of A, take the determinant 
of the (n — 1) x (n — 1) matrix which results, (This is called the ij th minor of A. ) and then 
multiply this number by (—1)* 3 . Thus (—1)* 3 x [the ij th minor) equals the ij th cofactor. To make 
the formulas easier to remember, cof [A)^ will denote the ij th entry of the cofactor matrix. 

With this definition of the cofactor matrix, here is how to define the determinant of an n x n 
matrix. 



Definition 6.1.13 Let A be an n x n matrix where n > 2 and suppose the determinant of an 
[n — 1) x [n — 1) has been defined. Then 



det [A) = J2 a ij cof i A )ij = J2 a * cof ( A ^j 

3=1 i=l 



(6.1) 



The first formula consists of expanding the determinant along the i th row and the second expands 
the determinant along the j th column. 

Theorem 6.1.14 Expanding the n x n matrix along any row or column always gives the same 
answer so the above definition is a good definition. 

6.1.2 The Determinant Of A Triangular Matrix 

Notwithstanding the difficulties involved in using the method of Laplace expansion, certain types of 
matrices are very easy to deal with. 

Definition 6.1.15 A matrix M , is upper triangular if Mij = whenever i > j . Thus such a matrix 
equals zero below the main diagonal, the entries of the form Ma, as shown. 



(* 









* \ 



* 



A lower triangular matrix is defined similarly as a matrix for which all entries above the main 
diagonal are equal to zero. 

You should verify the following using the above theorem on Laplace expansion. 

Corollary 6.1.16 Let M be an upper (lower) triangular matrix. Then det (M) is obtained by taking 
the product of the entries on the main diagonal. 

Example 6.1.17 Let 



A = 



Find det [A) . 

From the above corollary, it suffices to take the product of the diagonal elements. Thus det [A) = 
Ix2x3x(— 1) = — 6. Without using the corollary, you could expand along the first column. This 



/I 


2 


3 


77 \ 





2 


6 


7 








3 


33.7 


\o 








-1 / 
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gives 





2 6 


7 






2 3 


77 




1 


3 33.7 
0-1 


+ 0(-l) 2+1 


3 33.7 
0-1 


+ 




2 3 77 




2 3 77 


0(-l) 3+1 


2 6 7 


+ 0(-l) 4+1 


2 6 7 






0-1 






3 


33.7 



and the only nonzero term in the expansion is 



2 


6 


7 





3 


33.7 








-1 



Now expand this along the first column to obtain 



1 x 2 x 



3 33.7 
-1 



-0(-l) 



2+1 



-1 



+ 0(-l) 



3+1 



6 7 
3 33.7 



3 33.7 
-1 



= 1 x 2 x 
Next expand this last determinant along the first column to obtain the above equals 

1 x 2 x 3 x (-1) = -6 
which is just the product of the entries down the main diagonal of the original matrix. 

6.1.3 Properties Of Determinants 

There are many properties satisfied by determinants. Some of these properties have to do with row 
operations. Recall the row operations. 

Definition 6.1.18 The row operations consist of the following 

1. Switch two rows. 

2. Multiply a row by a nonzero number. 

3. Replace a row by a multiple of another row added to itself. 

Theorem 6.1.19 Let A be an n x n matrix and let A\ be a matrix which results from multiplying 
some row of A by a scalar c. Then cdet (A) = det (Ai). 



Example 6.1.20 Let A = 



1 2 
3 4 



,Ai = 



2 4 

3 4 



. det (A) = -2, det (A t ) = -4. 



Theorem 6.1.21 Let A be an nxn matrix and let A\ be a matrix which results from switching two 
rows of A. Then det (A) = — det (Ai) . Also, if one row of A is a multiple of another row of A, then 
det (A) = 0. 

Example 6.1.22 Let A = ( J and let A x = ( j . det A = -2, det (A ± ) = 2. 

Theorem 6.1.23 Let A be an n x n matrix and let A\ be a matrix which results from applying 
row operation 3. That is you replace some row by a multiple of another row added to itself. Then 
det (A) = det(Ai). 
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Example 6.1.24 Let A 



1 2 \ , , ± A (I 2 

and let A\ 



Thus the second row of A\ is one 



times the first row added to the second row. det (A) = — 2 and det (Ai) = — 2. 

Theorem 6.1.25 In Theorems 6.1.19 - 6.1.23 you can replace the word, "row" with the word "col- 
umn". 

There are two other major properties of determinants which do not involve row operations. 

Theorem 6.1.26 Let A and B be two n x n matrices. Then 



det(AB) = det (A) det (B). 



Also, 



det(A)=det(A T ). 



Example 6.1.27 Compare det (AB) and det (A) det (B) for 

1 2 



First 



and so 



Now 



and 



-3 2 



AB 



1 2 
-3 2 



det (AB) = det 
det (A) = det 



,B: 



3 2 

4 1 



11 4 
-1 -4 

1 2 
-3 2 



3 2 

4 1 



11 4 
-1 -4 



-40. 



del [B) =dd ( 4 J I =■ 5 - 



Download free eBooks at bookboon.com 



131 



Elementary Linear Algebra 



Determinants 



Thus det (A) det (B) = 8 x (-5) = -40. 

6.1.4 Finding Determinants Using Row Operations 

Theorems 6.1.23 - 6.1.25 can be used to find determinants using row operations. As pointed out 
above, the method of Laplace expansion will not be practical for any matrix of large size. Here is 
an example in which all the row operations are used. 

Example 6.1.28 Find the determinant of the matrix 



/I 


2 


3 


M 


5 


1 


2 


3 


4 


5 


4 


3 


\2 


2 


-4 


5/ 



Replace the second row by (—5) times the first row added to it. Then replace the third row by 
(—4) times the first row added to it. Finally, replace the fourth row by (—2) times the first row 




www.im^rith-zf 
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added to it. This yields the matrix 



/l 





2 3 4 \ 

-9 -13 -17 
-3 -8 -13 
-2 -10 -3 / 


ame 


determinant as A. 


/I 






2 3 4 \ 
11 22 
-3 -8 -13 
6 30 9 / 



B 



and from Theorem 6.1.23, it has the same determinant as A. Now using other row operations, 
det (B) = (=±) det (C) where 



C 



The second row was replaced by (—3) times the third row added to the second row. By Theorem 
6.1.23 this didn't change the value of the determinant. Then the last row was multiplied by (—3) . 
By Theorem 6.1.19 the resulting matrix has a determinant which is (—3) times the determinant of 
the un- multiplied matrix. Therefore, we multiplied by —1/3 to retain the correct value. Now replace 
the last row with 2 times the third added to it. This does not change the value of the determinant 
by Theorem 6.1.23. Finally switch the third and second rows. This causes the determinant to be 
multiplied by (-1) . Thus det (C) = - det (D) where 



D 



You could do more row operations or you could note that this can be easily expanded along the first 
column followed by expanding the 3x3 matrix which results along its first column. Thus 



/l 


2 


3 


4 \ 





-3 


-8 


-13 








11 


22 


V° 





14 


-17 J 



det (D) = 1 (-3) 



11 
14 



22 
-17 



1485 



and so det (C) = -1485 and det (A) = det (B) = (=±) (-1485) = 495. 
Example 6.1.29 Find the determinant of the matrix 



1 
2 
\3 



2\ 
1 
5 
2/ 



Replace the second row by ( — 1) times the first row added to it. Next take —2 times the first row 
and add to the third and finally take —3 times the first row and add to the last row. This yields 



/I 


2 


3 


2 \ 





-5 


-1 


-1 





-3 


-4 


1 


Vo 


-10 


-8 


"4/ 



By Theorem 6.1.23 this matrix has the same determinant as the original matrix. Remember you can 
work with the columns also. Take —5 times the last column and add to the second column. This 
yields 

/ 1 -8 3 2 \ 

0-1-1 

0-8-4 1 

\ 10 -8 -4 ) 
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By Theorem 6.1.25 this matrix has the same determinant as the original matrix. Now take (—1) 
times the third row and add to the top row. This gives. 

/ 1 7 1 \ 

0-1-1 

0-8-4 1 
\ 10 -8 -4 ) 

which by Theorem 6.1.23 has the same determinant as the original matrix. Lets expand it now along 
the first column. This yields the following for the determinant of the original matrix. 




which equals 



8 del; f ~g _^ J +10det( _\ * )=■ 82 



We suggest you do not try to be fancy in using row operations. That is, stick mostly to the one 
which replaces a row or column with a multiple of another row or column added to it. Also note 
there is no way to check your answer other than working the problem more than one way. To be 
sure you have gotten it right you must do this. 

6.2 Applications 

6.2.1 A Formula For The Inverse 

The definition of the determinant in terms of Laplace expansion along a row or column also provides 
a way to give a formula for the inverse of a matrix. Recall the definition of the inverse of a matrix 
in Definition 5.1.28 on Page 111. Also recall the definition of the cofactor matrix given in Definition 
6.1.12 on Page 128. This cofactor matrix was just the matrix which results from replacing the ij th 
entry of the matrix with the ij th cofactor. 

The following theorem says that to find the inverse, take the transpose of the cofactor matrix 
and divide by the determinant. The transpose of the cofactor matrix is called the adjugate or 
sometimes the classical adjoint of the matrix A. In other words, A -1 is equal to one divided by 
the determinant of A times the adjugate matrix of A. This is what the following theorem says with 
more precision. 

Theorem 6.2.1 A~ x exists if and only if det( A) ^ 0. If det( A) ^ 0, then A' 1 = (a"- 1 ) where 



ar. 1 =det(A)- 1 cof(A) i , 



for cof (A).- the ij th cofactor of A. 



Jxj 



Example 6.2.2 Find the inverse of the matrix 



A = 




First find the determinant of this matrix. Using Theorems 6.1.23 - 6.1.25 on Page 131, the 
determinant of this matrix equals the determinant of the matrix 
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which equals 12. The cofactor matrix of A is 



-2 
-2 

8 



6 



-6 



Each entry of A was replaced by its cofactor. Therefore, from the above theorem, the inverse of A 
should equal 

/ 

T 




h \ 



V 



Does it work? You should check to see if it does. When the matrices are multiplied 



/ 



2 
3 

_ 1 
2 




and so it is correct. 



Example 6.2.3 Find the inverse of the matrix 

( \ 



V 



1 1 

"6 3 



5 2 
"6 3 



\ \ 

_1 
2 



/ 



First find its determinant. This determinant is \. The inverse is therefore equal to 






1 
2 




1 


1 


— 


3 


2 





1 

2 

_5 
6 

1 

2 





\ o 




1 1 

6 3 
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Expanding all the 2x2 determinants this yields 

1 x T 



( Z 



Always check your work. 



V 



/ 



and so we got it right. If the result of multiplying these matrices had been something other than 
the identity matrix, you would know there was an error. When this happens, you need to search for 
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the mistake if you are interested in getting the right answer. A common mistake is to forget to take 
the transpose of the cofactor matrix. 

Proof of Theorem 6.2.1: From the definition of the determinant in terms of expansion along 
a column, and letting (a ir ) = A, if det (A) ^ 0, 

n 

J2 a ir cof (A) ir det(A)- 1 = det(A) det(A)- 1 = 1. 
Now consider 

n 

^a^cof^^det^)- 1 

i=l 

when k ^ r. Replace the k th column with the r th column to obtain a matrix B k whose determinant 
equals zero by Theorem 6.1.21. However, expanding this matrix B k along the k th column yields 

n 

= det (B k ) det (A) -1 = ^ a ir cof (A) ik det (A) -1 



Summarizing, 



Now 



n ^ , 

^ a ir cof (A) ik det (A) -1 = S rk = < 

i=l ^ 



1 if r = k 
Oifr^k 



J2 a ir COf (A) ik = J2 a ir Cof (A)^ 
i=l i=l 

which is the kr th entry of cof (A) A. Therefore, 

cof(A) T 



det (A) 
Using the other formula in Definition 6.1.13, and similar reasoning, 

n 

^ a rj cof (A) kj det (A) -1 = 5 rk 



A = I. (6.2) 



Now 



Y^ a rj cof (A) kj = J2 a rj cof ( A )Jk 

j=± 3=1 



which is the rk th entry of A cof (A) . Therefore, 



det (A) 
and it follows from 6.2 and 6.3 that A -1 = (a^- 1 ), where 



^£=*. (^) 



In other words, 



a"- 1 = cof (A)^ det (A)" 1 . 



A _,_ cof (Af 



det (A) ' 
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Now suppose A -1 exists. Then by Theorem 6.1.26, 

1 = det (I) = det (A A' 1 ) = det (A) det (A' 1 ) 

so det (A) ^ 0. ■ 

This way of finding inverses is especially useful in the case where it is desired to find the inverse 
of a matrix whose entries are functions. 

Example 6.2.4 Suppose 

A(t) = 

Show that A (t)~ exists and then find it. 

First note det (A (t)) = e* ^ so A (t)~ exists. The cofactor matrix is 

/ 1 

C(t) = I e^cost e*sint 
y — e l sint e^ cost 

and so the inverse is 

\ T / e"* 

e* cos t e* sin t ] = 
— e^sint e l cost y y 

6.2.2 Cramer's Rule 

This formula for the inverse also implies a famous procedure known as Cramer's rule. Cramer's 
rule gives a formula for the solutions, x, to a system of equations, Ax = y in the special case that A 
is a square matrix. Note this rule does not apply if you have a system of equations in which there 
is a different number of equations than variables. 

In case you are solving a system of equations, Ax = y for x, it follows that if A -1 exists, 

x = (A' 1 A) x = A' 1 (Ax) = A _1 y 

thus solving the system. Now in the case that A -1 exists, there is a formula for A -1 given above. 
Using this formula, 

n n 1 

x i = E a ihi = E d^4) cof {A) a y j- 

By the formula for the expansion of a determinant along a column, 

(* ••• 2/1 ••• * \ 
; ; ; > 
* ••• Vn ••• * / 

where here the i th column of A is replaced with the column vector (yi • • • •, y n ) , and the determinant 
of this modified matrix is taken and divided by det (A). This formula is known as Cramer's rule. 





Procedure 6. 2. 5 Suppose A is an n x n matrix and it is desired to solve the system Ax = y, y = 

7,er ; s n 

det A* 



(yir ' ' 5 Vn) f or x = (#1, • • • , x n ) . Then Cramer's rule says 



det A 
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where A{ is obtained from A by replacing the i th column of A with the column 



Example 6.2.6 Find x^y if 



From Cramer's rule, 



Now to find y, 



(yir- ,y n ) T • 




1 


1 1 


3 


2 1 


2 


3 2 


1 


2 1 


3 


2 1 


2 


-3 2 




1 


2 


1 


2 


2 


1 


3 


-3 


2 


1 


2 


1 


3 


2 


1 


2 


-3 


2 



1 2 1 
3 2 2 
2-3 3 



1 2 1 
3 2 1 
2-3 2 



11 
14 
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You see the pattern. For large systems Cramer's rule is less than useful if you want to find an 
answer. This is because to use it you must evaluate determinants. However, you have no practical 
way to evaluate determinants for large matrices other than row operations and if you are using row 
operations, you might just as well use them to solve the system to begin with. It will be a lot less 
trouble. Nevertheless, there are situations in which Cramer's rule is useful. 



Example 6.2.7 Solve for z if 











e cos t e sin t 
—e t svnt e t cost 




You could do it by row operations but it might be easier in this case to use Cramer's rule because 
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the matrix of coefficients does not consist of numbers but of functions. Thus 



z = 



1 1 

e t cos t t 
— e t s'mt t 2 



1 

e t cos t e t sin t 
— e l svat e 1 cost 



t ((cost) t + sin t)e t . 



You end up doing this sort of thing sometimes in ordinary differential equations in the method of 
variation of parameters. 



6.3 Exercises 

1. Find the determinants of the following matrices. 



(a) 
(b) 

(c) 



1 2 3 \ 

3 2 2 (The answer is 31.) 
9 8/ 

4 3 2 \ 

17 8 (The answer is 375.) 
3-93/ 

/ 1 2 3 2 \ 

13 2 3 

4 15 

\ 1 2 1 2 J 



, (The answer is —2.) 



2. Find the following determinant by expanding along the first row and second column. 

1 2 1 

2 1 3 
2 1 1 

3. Find the following determinant by expanding along the first column and third row. 



1 


2 


1 


1 





1 


2 


1 


1 



4. Find the following determinant by expanding along the second row and first column. 

1 2 1 

2 1 3 
2 1 1 

5. Compute the determinant by cofactor expansion. Pick the easiest row or column to use. 



1 








1 


2 


1 


1 














2 


2 


1 


3 


1 
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6. Find the determinant using row operations. 



1 


2 


1 


2 


3 


2 


-4 


1 


2 



7. Find the determinant using row operations. 



2 1 3 
2 4 2 
1 4 -5 



8. Find the determinant using row operations. 



9. Find the determinant using row operations. 



1 


2 


1 


2 


3 


1 


-2 


3 


-1 





3 


1 


2 


3 


2 


i 


ion 
1 


s. 
4 


1 


2 


3 


2 


-2 


3 


-1 





3 


3 


2 


1 


2 


e 



10. Verify an example of each property of determinants found in Theorems 6.1.23 - 6.1.25 for 2 x 2 
matrices. 

11. An operation is done to get from the first matrix to the second. Identify what was done and 
tell how it will affect the value of the determinant. 



a b 
c d 



a c 
b d 



12. An operation is done to get from the first matrix to the second. Identify what was done and 
tell how it will affect the value of the determinant. 



a b 
c d 



c d 
a b 



13. An operation is done to get from the first matrix to the second. Identify what was done and 
tell how it will affect the value of the determinant. 



a b 
c d 



a b 

a + c b + d 



14. An operation is done to get from the first matrix to the second. Identify what was done and 
tell how it will affect the value of the determinant. 



a b 
c d 



a b 
2c 2d 



15. An operation is done to get from the first matrix to the second. Identify what was done and 
tell how it will affect the value of the determinant. 



a b 
c d 



b a 
d c 
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16. Let A be an r x r matrix and suppose there are r — 1 rows (columns) such that all rows 
(columns) are linear combinations of these r — 1 rows (columns). Show det (A) = 0. 

17. Show det (a A) = a n det (A) where here A is an n x n matrix and a is a scalar. 

18. Illustrate with an example of 2 x 2 matrices that the determinant of a product equals the 
product of the determinants. 

19. Is it true that det (A + B) = det (A) + det (B)? If this is so, explain why it is so and if it is 
not so, give a counter example. 

20. An n x n matrix is called nilpotent if for some positive integer, k it follows A k = 0. If A is a 
nilpotent matrix and k is the smallest possible integer such that A k = 0, what are the possible 
values of det (A)? 

21. A matrix is said to be orthogonal if A T A = I. Thus the inverse of an orthogonal matrix is 
just its transpose. What are the possible values of det (A) if A is an orthogonal matrix? 

22. Fill in the missing entries to make the matrix orthogonal as in Problem 21. 

/ -i i vi2 \ 

/ V2 76 6 \ 

1 

V - 3 - J 

23. Let A and B be two nxn matrices. A ~ B (A is similar to B) means there exists an invertible 
matrix S such that A = S^BS. Show that if A ~ B, then B - A. Show also that A ~ A and 
that if A - B and B - C, then A~C. 

24. In the context of Problem 23 show that \l A~B, then det (A) = det (B) . 

25. Two nxn matrices, A and 5, are similar if B = S~ 1 AS for some invertible nxn matrix 
S. Show that if two matrices are similar, they have the same characteristic polynomials. The 
characteristic polynomial of an n x n matrix M is the polynomial, det (XI — M) . 



Download free eBooks at bookboon.com 

143 



Elementary Linear Algebra 



Determinants 



26. Tell whether the statement is true or false. 

(a) If A is a 3 x 3 matrix with a zero determinant, then one column must be a multiple of 
some other column. 

(b) If any two columns of a square matrix are equal, then the determinant of the matrix 
equals zero. 

(c) For A and B two n x n matrices, det (A + B) = det (A) + det (B) . 

(d) For iannxn matrix, det (3 A) = 3 det (A) 

(e) If A' 1 exists then det (A' 1 ) = det (A)' 1 . 

(f ) If B is obtained by multiplying a single row of A by 4 then det (B) = 4 det (A) . 

(g) For A an n x n matrix, det (—A) = ( — l) n det (A) . 

(h) If A is a real n x n matrix, then det (A T A) > 0. 

(i) Cramer's rule is useful for finding solutions to systems of linear equations in which there 
is an infinite set of solutions. 

(j) If A k = for some positive integer, /c, then det (A) = 0. 
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(k) If Ax = for some x/0, then det (A) = 0. 

27. Use Cramer's rule to find the solution to 

x + 2y = 1 
2x-y = 2 



28. Use Cramer's rule to find the solution to 



x + 2y + z = 1 
2x — 2/ — z = 2 

X + z = 1 




29. Here is a matrix, 



Determine whether the matrix has an inverse by finding whether the determinant is non zero. 
If the determinant is nonzero, find the inverse using the formula for the inverse which involves 
the cofactor matrix. 

30. Here is a matrix, 




Determine whether the matrix has an inverse by finding whether the determinant is non zero. 
If the determinant is nonzero, find the inverse using the formula for the inverse which involves 
the cofactor matrix. 

31. Here is a matrix, 




Determine whether the matrix has an inverse by finding whether the determinant is non zero. 
If the determinant is nonzero, find the inverse using the formula for the inverse which involves 
the cofactor matrix. 

32. Here is a matrix, 




Determine whether the matrix has an inverse by finding whether the determinant is non zero. 
If the determinant is nonzero, find the inverse using the formula for the inverse which involves 
the cofactor matrix. 

33. Here is a matrix, 




Determine whether the matrix has an inverse by finding whether the determinant is non zero. 
If the determinant is nonzero, find the inverse using the formula for the inverse which involves 
the cofactor matrix. 
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34. Use the formula for the inverse in terms of the cofactor matrix to find if possible the inverses 
of the matrices 

i i\ t 1 2 3 




If the inverse does not exist, explain why. 
35. Here is a matrix, 



1 











cost 


— sint 





sint 


cost 



Does there exist a value of t for which this matrix fails to have an inverse? Explain. 
36. Here is a matrix, 




Does there exist a value of t for which this matrix fails to have an inverse? Explain. 

37. Here is a matrix, 

f cosh t sinh t 

j* sinh t cosh t 

f cosh t sinh t 

Does there exist a value of t for which this matrix fails to have an inverse? Explain. 

38. Show that if det (A) ^ for A an n x n matrix, it follows that if Ax = 0, then x = 0. 

39. Suppose A, B are nx n matrices and that AB = I. Show that then BA = I. Hint: You might 
do something like this: First explain why det (A) , det (B) are both nonzero. Then (AB) A = A 
and then show BA (BA — I) = 0. From this use what is given to conclude A (BA — I) = 0. 
Then use Problem 38. 

40. Use the formula for the inverse in terms of the cofactor matrix to find the inverse of the matrix 

e* 

A = | e t cos t e t sin t 

e* cos t — e* sin t e l cos t + e* sin t 

41. Find the inverse if it exists of the matrix 



e* 


cost 


sint 


e* 


— sint 


cost 


e* 


— cost 


— sint 



42. Here is a matrix, 



e _t cost e _t sint 

-e~ l cos t — e _t sin t — e - ^ sin t + e _t ( 
2e _t sint — 2e~^cost 



Does there exist a value of t for which this matrix fails to have an inverse? Explain. 

43. Suppose A is an upper triangular matrix. Show that A~ x exists if and only if all elements of 
the main diagonal are non zero. Is it true that A~ x will also be upper triangular? Explain. Is 
everything the same for lower triangular matrices? 
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44. If A, £?, and C are each n x n matrices and ABC is invertible, why are each of A, £?, and C 
invert ible. 

45.LetF(t)=det(^j |$> ). Verify 



F' (t) = det 



a' (i) 6' (i) 
c (i) d (t) 



det 



a (i) b (t) 
d (t) d' (t) 



Now suppose 

/ a(t) b(t) c(t) 

F (t) = det d (t) e (t) f (t) 

\g(t) h(t) i(t) 

Use Laplace expansion and the first part to verify F' (t) = 



a (t) b (t) c (t) 

det | d! (t) el (t) f (t) 

g(t) h(t) i(t) 



a'(t) b'(t) c'(t) 

det | d (t) e (t) f (t) 

g(t) h(t) i(t) 

I a(t) b(t) c(t) 

+ det d(t) e(t) f (t) 

\g'(t) h'(t) i'(t) 

Conjecture a general result valid for n x n matrices and explain why it will be true. Can a 
similar thing be done with the columns? 



46. Let Ly = y 



(n) 



(x)y 



(n-l) 



a\ (x) y' + clq (x) y where the a^ are given continuous 



functions defined on a closed interval, (a, b) and y is some function which has n derivatives so 
it makes sense to write Ly. Suppose Ly^ = for k = 1, 2, • • • ,n. The Wronskian of these 
functions, yi is defined as 



/ 



W(y 1: --- ,y n )(x) =det 



,(n-l) 



Vn(x) 

y' n ( x ) 



\ 



Show that for W (x) = W (y u • 



, y n ) (x) to save space, 



(n 
2/n 



■ 1} (X) / 



W 7 (x) = det 



/ 2/iW 



1/nW \ 



V ^i n) (x) 



2/' 



(n) 



(X) / 



Now use the differential equation, Ly = which is satisfied by each of these functions, yi 
and properties of determinants presented above to verify that W + a n _i (x) W = 0. Give an 
explicit solution of this linear differential equation, Abel's formula, and use your answer to 
verify that the Wronskian of these solutions to the equation, Ly = either vanishes identically 
on (a, b) or never. Hint: To solve the differential equation, let A! (x) = a n _i (x) and multiply 
both sides of the differential equation by e A ^ and then argue the left side is the derivative of 
something. 

47. Find the following determinants. 

2 2 + 2i 3-3i 
(a) det [ 2 - 2i 5 1 - li 

3 + 3z 1 + 7z 16 
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10 2 + 6i S-6i 
(b) det [ 2 - 6i 9 1 — 7z 

8 + 6i l + 7z 17 
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The Mathematical Theory Of 
Determinants* 






7.1 The Function sgn n 

This material is definitely not for the faint of heart. It is only for people who want to see everything 
proved. It is a fairly complete and unusually elementary treatment of the subject. There will be 
some repetition between this section and the earlier section on determinants. The main purpose 
is to give all the missing proofs. Two books which give a good introduction to determinants are 
Apostol [1] and Rudin [13]. A recent book which also has a good introduction is Baker [2]. Most 
linear algebra books do not do an honest job presenting this topic. 

It is easiest to give a different definition of the determinant which is clearly well defined and then 
prove the earlier one in terms of Laplace expansion. Let (ii, • • • , i n ) be an ordered list of numbers 
from {1, • • • , n} . This means the order is important so (1, 2, 3) and (2,1,3) are different. 

The following Lemma will be essential in the definition of the determinant. 

Lemma 7.1.1 There exists a unique function, sgn n which maps each list of numbers from {1, • • • , n} 
to one of the three numbers, 0,1, or —1 which also has the following properties. 

sgn n (l,-" ,n) = l (7.1) 

sgn n (ii,-" ,p, ••• ,g, ••• ,i n ) = -sgn n (n,--- ,q, • • • ,p, • • • ,i n ) (7.2) 

In words, the second property states that if two of the numbers are switched, the value of the function 
is multiplied by —1. Also, in the case where n > 1 and {ii, • • • , i n } = {1, • • • , n} so that every number 
from {1, • • • , n} appears in the ordered list, (ii, • • • , i n ) , 

sgn n (n,--- ,Z0_i,n,i0 + i,--- ,i n ) = 

(-l) n "^sgn n _ 1 (n,--- ,Z0_i,Z0+i,- •• ,i n ) (7.3) 

where n = %q in the ordered list, (ii, • • • , i n ) • 

Proof: To begin with, it is necessary to show the existence of such a function. This is clearly true 
if n — 1. Define sgn x (1) = 1 and observe that it works. No switching is possible. In the case where 
n = 2, it is also clearly true. Let sgn 2 (1, 2) = 1 and sgn 2 (2, 1) = while sgn 2 (2, 2) = sgn 2 (1, 1) = 
and verify it works. Assuming such a function exists for n, sgn n+1 will be defined in terms of sgn n . If 
there are any repeated numbers in (ii, • • • , z n +i) , sgn n+1 (ii, • • • , i n +i) = 0- If there are no repeats, 
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then ri+1 appears somewhere in the ordered list. Let be the position of the number n + 1 in the 
list. Thus, the list is of the form (n, • • • , io-i,n + 1, ie+i, • • • , i n +i) • From 7.3 it must be that 

s £ n n+i (hr- > io-i,n + 1, ie+u '- , Wi) = 

/ 1 \n+l— /• • • \ 

(- 1 ) S S n n (n, * ' * , ^-1,^+1, ' * ' , *n+l) • 

It is necessary to verify this satisfies 7.1 and 7.2 with n replaced with n + 1. The first of these is 
obviously true because 

s S n n+i (1, • • • ,n, n H- 1) = (_i) n + 1 -( n + 1 ) sgrin (1, . . . , n ) = 1. 

If there are repeated numbers in (ii, • • • , i n +i) , then it is obvious 7.2 holds because both sides would 
equal zero from the above definition. It remains to verify 7.2 in the case where there are no numbers 
repeated in (ii, • • • , i n +i) • Consider 



s g n n+i (iir- ,P," m ,Qr- > Wi) > 



where the r above the p indicates the number, p is in the r th position and the s above the q indicates 
that the number q is in the s th position. Suppose first that r < 6 < s. Then 



s g n n+i Mi,'" ,P, •'• ,n + l,--- ,q,.-. ,i n+1 
(-1) sgn n ^i,"- ,p, ••• , q , ••• ,^n+ij 



while 

s g n n+i ( *i>"" >$>"• ,n + l,--- ,£,••• ,i n+ i 

(-1) sgn n ^i, • • • , g, • • • , p , • • • , z n+ ij 

and so, by induction, a switch of p and q introduces a minus sign in the result. Similarly, if > s 
or if < r it also follows that 7.2 holds. The interesting case is when = r or = s. Consider the 
case where = r and note the other case is entirely similar. 

s & n n+i («i> ' ' ' , n + 1, • • • , q, • • • , z n+ ij = 
(-l) n+1_r sgn n (ii, • • • , S q , • • • , z n+ ij (7.4) 

;n n+ i (n, • • • , q, • • • , n + 1, • • • , i n+ ij = 

(-l) n+1_s sgn n (n, . . . , J, • • • , z n+1 J . (7.5) 

By making 5 — 1 — r switches, move the q which is in the s — 1 th position in 7.4 to the r th position 
in 7.5. By induction, each of these switches introduces a factor of —1 and so 



while 

sg: 



sgn n (n, • • • , V, • • • , z n+ ij = (-l) s X r sgn n (i u • • • , q, • • • , i n+1 j . 
Therefore, 

s & n n+i («i> ' • • , n + 1, • • • , q, • • • , i n+ ij = (_i) n+1 - r S gn n (^n, • • • , S q , • • • , z n+ ij 
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/ i\n+l — r / t\s — l — r ( ■ r 

= (-1) (-1) sgn n ^i,'-- ,q, ••• ,i n ^ 

= (-1) sgn n ^i, • • • , q, • • • , z n+ ij = (-1) (-1) sgn n ^ 1? • • • , g, • • • , z n+ ij 

= - s g n n+l (*i> • * * » 5, * ' ' , n + 1, • • • , i n+ ij . 

This proves the existence of the desired function. 

To see this function is unique, note that you can obtain any ordered list of distinct numbers 
from a sequence of switches. If there exist two functions, / and g both satisfying 7.1 and 7.2, you 
could start with / (1, • • • , n) = g (1, • • • , n) and applying the same sequence of switches, eventually 
arrive at / (ii, • • • ,i n ) — g (*i, • • * , i n ) • If an y numbers are repeated, then 7.2 gives both functions 
are equal to zero for that ordered list. ■ 

In what follows sgn will often be used rather than sgn n because the context supplies the appro- 
priate n. 

7.2 The Determinant 

Definition 7.2.1 Let f be a function which has the set of ordered lists of numbers from {1, • • • , n} 
as its domain. Define 

E /( fc i--- fc «) 

(/ci,--- ,k n ) 

to be the sum of all the / (fci • • • k n ) for all possible choices of ordered lists (fci, • • • , k n ) of numbers 
of {1, • • • ,n}. For example, 

E /(fci,fc 2 ) = /(l,2) + /(2,l) + /(l,l) + /(2,2). 
(kite) 



t 
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7.2.1 The Definition 

Definition 7.2.2 Let (a^) = A denote an n x n matrix. The determinant of A, denoted by det (A) 
is defined by 

det (A) = ^2 sgn (fci , • • • , fc n ) a lkl ■ • • a nkn 

where the sum is taken over all ordered lists of numbers from {1, • • • , n}. Note it suffices to take the 
sum over only those ordered lists in which there are no repeats because if there are, sgn (fei, • • • , k n ) = 
and so that term contributes to the sum. 
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7.2.2 Permuting Rows Or Columns 

Let A be an n x n matrix, A = (a^) and let (7*1, • • • , r n ) denote an ordered list of n numbers from 
{1, • • • , n}. Let A (7*1, • • • , r n ) denote the matrix whose k th row is the r k row of the matrix A. Thus 

det (A (n, • • • , r n )) = ^ sgn (fc 1? • • • , fc n ) a rifcl • • • a rrifcri (7.6) 

and 

A(l,...,n) = A 

Proposition 7.2.3 Let 

be an ordered list of numbers from {1, • • • , n}. Then 

sgn(ri,'-- ,r n )det(A) 

= ^2 sgn(fei,--- ,k n )a rikl "-a rnkn (7.7) 

(fci,-,fc n ) 
= det(A(n,.-.,r n )). (7.8) 

Proof: Let (1, • • • , n) = (1, • • • , r, • • • s, • • • , n) so r < s. 

det(A(l,... ,r,.-. ,5,... ,n))= (7.9) 

/ s S n V^l' ' ' ' 7 ^r? ' ' ' j^sj'" 5 ^n) tt lfei ' ' ' tt r/c r ' ' ' tt s/c s ' ' ' a nk n i 

(k!,-,k n ) 

and renaming the variables, calling fc s , k r and fc r , fc s , this equals 

= / s g n (^1? ' ' ' jfesj'" j^rj'" 5 ^n) a lfci ' ' ' a r/e s ' ' ' tt s/c r ' ' ' a n/e^ 

(fei,-,fcn) 

(These got switched \ 

^1? ' ' ' 5 ^r? ' ' ' ■> k s , • • • , /C n J tti^ • • • <2 S / Cr • • • <2 r /c s • • • ttnfe^ 
/ 

= -det(A(l,.-- ,5,-.. ,7v ,n)). (7.10) 

Consequently, 

det(A(l,--- ,5, ••• ,r, ••• ,n)) = 

- det (A (1, • • • , r, • • • , 5, • • • , n)) = - det (A) 

Now letting A (1, • • • , s, • • • , r, • • • , n) play the role of A, and continuing in this way, switching pairs 
of numbers, 

det(A( ri ,---,r n )) = (-l) p det(A) 

where it took p switches to obtain(ri, • • • , r n ) from (1, • • • , n). By Lemma 7.1.1, this implies 

det (A (n, • • • , r n )) = (-if det (A) = sgn (n, • • • , r n ) det (A) 

and proves the proposition in the case when there are no repeated numbers in the ordered list, 
(ri,--- , r n ). However, if there is a repeat, say the r th row equals the s th row, then the reasoning of 
7.9 -7.10 shows that det A (n, • • • , r n ) = and also sgn (n, • • • , r n ) = so the formula holds in this 
case also. ■ 

Observation 7.2.4 There are n\ ordered lists of distinct numbers from {1, • • • , n} . 

To see this, consider n slots placed in order. There are n choices for the first slot. For each of 
these choices, there are n — 1 choices for the second. Thus there are n(n — 1) ways to fill the first 
two slots. Then for each of these ways there are n — 2 choices left for the third slot. Continuing this 
way, there are n\ ordered lists of distinct numbers from {!,••• , n} as stated in the observation. 
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7.2.3 A Symmetric Definition 

With the above, it is possible to give a more symmetric description of the determinant from which 
it will follow that det (A) = det (A T ) . 

Corollary 7.2.5 The following formula for det (A) is valid. 

det (A) = ^-- 
n\ 

Yl Yl s g n ( r ir" ,r n )sgn(fci,-.- ,k n )a rikl '-a rnkn . (7.11) 

Oi,--- ,r n ) (fci,— ,k n ) 

And also det (A T ) — det (A) where A T is the transpose of A. (Recall that for A T = (clL), clL = Q>j%-) 
Proof: From Proposition 7.2.3, if the r^ are distinct, 

det (A) = Y sgn(n,--- ,r n ) sgn(fci, • • • ,k n )a rikl - ■ ■ a rnkn . 

(ki,-,k n ) 

Summing over all ordered lists, (ri,--- ,r n ) where the r^ are distinct, (If the r^ are not distinct, 
sgn (ri, • • • , r n ) = and so there is no contribution to the sum.) 

n\ det (A) = 
Y] Y s g n ( r ir-' ,r n )sgn(fei,--- ,k n )a rikl ■■■a rnkn . 

This proves the corollary since the formula gives the same number for A as it does for A T . ■ 

7.2.4 The Alternating Property Of The Determinant 

Corollary 7.2.6 If two rows or two columns in an n x n matrix A, are switched, the determinant 
of the resulting matrix equals ( — 1) times the determinant of the original matrix. If A is an n x n 
matrix in which two rows are equal or two columns are equal then det (A) = 0. Suppose the i th row 
of A equals (xa\ + yb\, • • • , xa n + yb n ). Then 

det (A) = x det (A ± ) -\-ydet (A 2 ) 

where the i th row of A\ is (ai, • • • , a n ) and the i th row of A 2 is (&i, • • • , b n ) , all other rows of A\ 
and A 2 coinciding with those of A. In other words, det is a linear function of each row A. The same 
is true with the word "row" replaced with the word "column". 

Proof: By Proposition 7.2.3 when two rows are switched, the determinant of the resulting matrix 
is (—1) times the determinant of the original matrix. By Corollary 7.2.5 the same holds for columns 
because the columns of the matrix equal the rows of the transposed matrix. Thus if A\ is the matrix 
obtained from A by switching two columns, 

det (A) = det (A T ) = - det (A[) = - det (A^ . 

If A has two equal columns or two equal rows, then switching them results in the same matrix. 
Therefore, det (A) = - det (A) and so det (A) = 0. 
It remains to verify the last assertion. 

det (A) = Y s & n ( fc i> ' ' ' j fe n) aifci • ' ■ (xa ki + yb ki ) ■ ■ • a nkn 
(fci,-,fcn) 
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= X ^ S £ n (*1j ' ' ' ' k n) a lk! • ' • Uki • ' • Clnk n 
(&!,-•• ,k n ) 

+y ^ sgn (fei, • • • , fc n ) ai fcl ■■■b ki --- a nkn 
(fei,-- ,fc n ) 

= xdet(Ai) +2/det(A 2 ) . 

The same is true of columns because det (^4 T ) = det (A) and the rows of A T are the columns of A. 
■ 

7.2.5 Linear Combinations And Determinants 

Definition 7.2.7 A vector w ; zs a linear combination of the vectors {vi,--- ,v r } if there exists 
scalars, c±,- • • c r such that w = Ylk=i c /c v fc- This is the same as saying 

w G spanjvi, • • • , v r } . 



The following corollary is also of great use. 

Corollary 7.2.8 Suppose A is an n x n matrix and some column (row) is a linear combination of 
r other columns (rows). Then det (A) = 0. 

Proof: Let A = ( ai • • • a n ) be the columns of A and suppose the condition that one 
column is a linear combination of r of the others is satisfied. Then by using Corollary 7.2.6 you may 
rearrange the columns to have the n th column a linear combination of the first r columns. Thus 
a n = J2k=i c k R k and so 

det (A) = det ( ai ••• a r ••• a n _i Y!k=i c k&k ) • 
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By Corollary 7.2.6 

r 

det (A) = Y^ Ck det ( ai • • • a r • • • a n _i a& ) = 0. 

The case for rows follows from the fact that det (A) = det (A T ). ■ 

7.2.6 The Determinant Of A Product 

Recall the following definition of matrix multiplication. 

Definition 7.2.9 If A and B are n x n matrices, A = (a^) and B = (bij), AB = (c%j) where 



y^Qifc&, 



ik^kj - 



fc=l 



One of the most important rules about determinants is that the determinant of a product equals 
the product of the determinants. 
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Theorem 7.2.10 Let A and B be n x n matrices. Then 

det (AB) = det (A) det (B) . 
Proof: Let cij be the ij th entry of AB. Then by Proposition 7.2.3, 

det (AB) = 

^ sgn (fci, • • • , fe n ) c lkl • • • c nfew 

(fcl,-,fc«) 

= ^ Sgn(fci,--- ,fc n ) I y^Qln^nfei J '•• I y^a n r n ^r n fe n J 

(fei,-,fe n ) \n / \r n J 

= ^ ^ Sgn(fci,--- ,k n )b rikl •• -6 rn fe n (fllri •••V) 

(t-i— ,r n ) (fei,— ,fe n ) 

= Y^ sgn (n • • • r n ) ai ri • • • a nrVi det (5) = det (A) det (5) . ■ 

(t-i— ,r n ) 

7.2.7 Cofactor Expansions 

Lemma 7.2.11 Suppose a matrix is of the form 

M =( A ol) ^ 

or 

where a is a number and A is an (n — 1) x (n — 1) matrix and * denotes either a column or a row 
having length n — 1 and the denotes either a column or a row of length n — 1 consisting entirely 
of zeros. Then 

det (M) = a det (A) . 

Proof: Denote M by (mij) . Thus in the first case, m nn = a and m n i = if z ^ n while in the 
second case, m nn = a and m in = if i ^ n. From the definition of the determinant, 

det (M) = ^2 s S n n (*ij ' ' ' j fc n) wiifcj • • • m n/e ^ 

(ki,-,k n ) 

Letting # denote the position of n in the ordered list, (fci, • • • , k n ) then using the earlier conventions 
used to prove Lemma 7.1.1, det (M) equals 

V^ n-9 ( ° n-l\ 

2^ (-l) n sgn n _ 1 f fcx,... ,^_i,^ + i,--- , k n I mi fel •••m nkn 
(k u -,k n ) ^ ' 

Now suppose 7.13. Then if k n ^ n, the term involving m nkn in the above expression equals zero. 
Therefore, the only terms which survive are those for which = n or in other words, those for which 
k n = n. Therefore, the above expression reduces to 

a ^ sgn n _ 1 (fci,---fc n _i)mi fcl • • • m (n _i) fcn _ 1 = a det (A). 

(fcl,-,fcn-l) 

To get the assertion in the situation of 7.12 use Corollary 7.2.5 and 7.13 to write 

det (M) = det (M T ) = det ( ( ^ M ] = adet (A T ) = adet (A) . ■ 

In terms of the theory of determinants, arguably the most important idea is that of Laplace 
expansion along a row or a column. This will follow from the above definition of a determinant. 
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Definition 7.2.12 Let A = (a^) be an nxn matrix. Then a new matrix called the cof actor matrix, 
cof (A) is defined by cof (A) = (c^-) where to obtain C{ 3 delete the i th row and the j th column of A, 
take the determinant of the (n — 1) x (n — 1) matrix which results, (This is called the ij th minor of 
A. ) and then multiply this number by (—1)* J . To make the formulas easier to remember, cof (A) • • 
will denote the ij th entry of the cofactor matrix. 

The following is the main result. Earlier this was given as a definition and the outrageous 
totally unjustified assertion was made that the same number would be obtained by expanding the 
determinant along any row or column. The following theorem proves this assertion. 

Theorem 7.2.13 Let A be an n x n matrix where n > 2. Then 

n n 

det (A) = J2 *ij ^f (A) id = J2 «* cof ( A h • ( 7 « 14 ) 

The first formula consists of expanding the determinant along the i th row and the second expands 
the determinant along the j th column. 

Proof: Let (a^i, • • • , a^ n ) be the i th row of A. Let B 3 be the matrix obtained from A by leaving 
every row the same except the i th row which in B 3 equals 

(0,... ,0,0^,0,... ,0). 

Then by Corollary 7.2.6, 

n 

det (A) = ^det(^) 

3 = 1 

Denote by A %3 the (n — 1) x (n — 1) matrix obtained by deleting the i th row and the j th column of 
A. Thus cof (A)-. = ( — 1)* J det (A*- 7 ) . At this point, recall that from Proposition 7.2.3, when two 
rows or two columns in a matrix M, are switched, this results in multiplying the determinant of the 
old matrix by —1 to get the determinant of the new matrix. Therefore, by Lemma 7.2.11, 

det^,) = (-If- (-If "Met ((f * 



= (-*)*'**(( IT i))=^ C0f (^- 



Therefore, 



det {A) = J2 a ij cof (^)< j 

3 = 1 

which is the formula for expanding det (A) along the i th row. Also, 

n 

det (A) = det (A T ) = ^ aj 3 - cof (A 

n 
= ^2 a 3i CO H A )ji 

which is the formula for expanding det (A) along the i th column. 



13 
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7.2.8 Formula For The Inverse 

Note that this gives an easy way to write a formula for the inverse of an n x n matrix. 

Theorem 7.2.14 A' 1 exists if and only if det( A) ± 0. If det (A) ^ ; then A' 1 = (a"- 1 ) where 

a-J =det(A)- 1 cof (A) ,. 



l j% 



for cof (A)., the ij f cof actor of A. 

Proof: By Theorem 7.2.13 and letting (a ir ) = A, if det (A) ^ 0, 

n 

J2 a ir ™f (A) ir det(A)~ 1 = det (A) det (A)" 1 = 1. 
Now consider 

n 

^a^cof^^det^)- 1 

2=1 

when k ^ r. Replace the k th column with the r th column to obtain a matrix B k whose determinant 
equals zero by Corollary 7.2.6. However, expanding this matrix along the k th column yields 

n 

= det (B k ) det (A) -1 = ^ a ir cof (A) ik det (A) -1 



i=l 



Summarizing, 



^ a ir cof (A) ik det (A) x = S rk . 




Download free eBooks at bookboon.com 



159 



Click on the ad to read more 



Elementary Linear Algebra The Mathematical Theory Of Determinants 

Using the other formula in Theorem 7.2.13, and similar reasoning, 

n 

^2 a rj cof (A) kj det (A)' 1 = S rk 
i=i 

This proves that if det (A) ^ 0, then A -1 exists with A -1 = (a^- 1 ), where 

aj = cof (A) j . det (A)" 1 . 
Now suppose A -1 exists. Then by Theorem 7.2.10, 

1 = det (J) = det (AA- 1 ) = det (A) det (A' 1 ) 

so det (A) ^ 0. ■ 

The next corollary points out that if an n x n matrix A has a right or a left inverse, then it has 
an inverse. 
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Corollary 7.2.15 Let A be an n x n matrix and suppose there exists an n x n matrix B such that 
BA = I. Then A~ x exists and A~ x — B. Also, if there exists C an n x n matrix such that AC = I, 
then A -1 exists and A -1 = C. 

Proof: Since BA = /, Theorem 7.2.10 implies 

det B det A = 1 

and so det A ^ 0. Therefore from Theorem 7.2.14, A~ x exists. Therefore, 

A' 1 = (BA) A' 1 = B {AA' 1 ) = BI = B. 

The case where CA = I is handled similarly. ■ 

The conclusion of this corollary is that left inverses, right inverses and inverses are all the same 
in the context of n x n matrices. 

Theorem 7.2.14 says that to find the inverse, take the transpose of the cofactor matrix and 
divide by the determinant. The transpose of the cofactor matrix is called the adjugate or sometimes 
the classical adjoint of the matrix A. It is an abomination to call it the adjoint although you do 
sometimes see it referred to in this way. In words, A~ x is equal to one over the determinant of A 
times the adjugate matrix of A. 

7.2.9 Cramer's Rule 

In case you are solving a system of equations, Ax = y for x, it follows that if A~ x exists, 

x = (A' 1 A) x = A' 1 (Ax) = A _1 y 

thus solving the system. Now in the case that A -1 exists, there is a formula for A -1 given above. 
Using this formula, 

n n 1 

x i = E % l Vi = E d^p) cof (A) ^ Vj - 

By the formula for the expansion of a determinant along a column, 

f * ■ • • 2/1 • • • * 

x i = -, — ttt det 
det (A) 

Vn 

where here the i th column of A is replaced with the column vector (yi • • • •, y n ) , and the determinant 
of this modified matrix is taken and divided by det (A). This formula is known as Cramer's rule. 

7.2.10 Upper Triangular Matrices 

Definition 7.2.16 A matrix M , is upper triangular if M^ = whenever i > j. Thus such a matrix 
equals zero below the main diagonal, the entries of the form Ma as shown. 

/ * * • • • * \ 

* '•. : 

: • . • . * 
\ ••• * / 

A lower triangular matrix is defined similarly as a matrix for which all entries above the main 
diagonal are equal to zero. 

With this definition, here is a simple corollary of Theorem 7.2.13. 

Corollary 7.2.17 Let M be an upper (lower) triangular matrix. Then det (M) is obtained by taking 
the product of the entries on the main diagonal. 
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7.3 The Cayley Hamilton Theorem* 

Definition 7.3.1 Let A be an n x n matrix. The characteristic polynomial is defined as 

p A (t) = det (tl - A) 

and the solutions to pa (t) = are called eigenvalues. For A a matrix and p(t) = t n + a n -\t n ~ x + 
• • • + a\t + ao, denote by p (A) the matrix defined by 

p (A) = A n + a n _i A n ~ l H h ai A + a I. 

77ie explanation for the last term is that A is interpreted as /, the identity matrix. 

The Cayley Hamilton theorem states that every matrix satisfies its characteristic equation, that 
equation defined by Pa (t) = 0. It is one of the most important theorems in linear algebra 1 . The 
following lemma will help with its proof. 

Lemma 7.3.2 Suppose for all |A| large enough, 

A + AiA + --- + A m A m = 0, 

where the Ai are n x n matrices. Then each Ai — 0. 

Proof: Multiply by A" m to obtain 

A \- m + AxA- m+1 + - • ■ + Arn^X' 1 + A m = 0. 

Now let |A| — » oo to obtain A m = 0. With this, multiply by A to obtain 

A A- m+1 + AxA- m+2 + • • • + A m _x = 0. 

Now let |A| — >> oo to obtain A m _i = 0. Continue multiplying by A and letting A ^ oo to obtain that 
all the Ai = 0. ■ 

With the lemma, here is a simple corollary. 

Corollary 7.3.3 Let Ai and Bi be n x n matrices and suppose 

A + Ai A + • • • + A m A m = B + B x \ + • • • + 5 m A m 

/or a// |A| /ar^e enough. Then Ai = Bi for all i. Consequently if A is replaced by any n x n matrix, 
the two sides will be equal. That is, for C any n x n matrix, 

Ao + A X C + • • • + A m C m = Bo + B X C + • • • + B m C m . 

Proof: Subtract and use the result of the lemma. ■ 



1 A special case was first proved by Hamilton in 1853. The general case was announced by Cayley some time later 
and a proof was given by Frobenius in 1878. 
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With this preparation, here is a relatively easy proof of the Cayley Hamilton theorem. 

Theorem 7.3.4 Let A be an n x n matrix and let p (A) = det (XI — A) be the characteristic poly- 
nomial Then p (A) = 0. 

Proof: Let C (A) equal the transpose of the cofactor matrix of (XI — A) for |A| large. (If |A| is 
large enough, then A cannot be in the finite list of eigenvalues of A and so for such A, (XI — A)~ 
exists.) Therefore, by Theorem 7.2.14 

C(X)=p(\)(XI-A)- 1 . 

Note that each entry in C (A) is a polynomial in A having degree no more than n — 1. Therefore, 
collecting the terms, 

C (A) = Co + Ci A + • • • + Cn-iA"- 1 

for Cj some n x n matrix. It follows that for all |A| large enough, 



(XI-A)(C ^C 1 X- 



■ C n -iX 



n-U 



p(X)I 



s 
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and so Corollary 7.3.3 may be used. It follows the matrix coefficients corresponding to equal powers 
of A are equal on both sides of this equation. Therefore, if A is replaced with A, the two sides will 
be equal. Thus 

= (A - A) (Co + C X A + ■ ■ ■ + C„_i A"" 1 ) =p(A)I = p(A). 

This proves the Cayley Hamilton theorem. ■ 
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Rank Of A Matrix 



8.1 Elementary Matrices 

The elementary matrices result from doing a row operation to the identity matrix. 
Definition 8.1.1 The row operations consist of the following 

1. Switch two rows. 

2. Multiply a row by a nonzero number. 

3. Replace a row by a multiple of another row added to it. 

The elementary matrices are given in the following definition. 

Definition 8.1.2 The elementary matrices consist of those matrices which result by applying a 
row operation to an identity matrix. Those which involve switching rows of the identity are called 
permutation matrices 1 . 

As an example of why these elementary matrices are interesting, consider the following. 

a b c d \ I x y z w 
xyzw\ = \abcd 
f 9 h i J \ f g h i 

A 3 x 4 matrix was multiplied on the left by an elementary matrix which was obtained from row 
operation 1 applied to the identity matrix. This resulted in applying the operation 1 to the given 
matrix. This is what happens in general. 




1 More generally, a permutation matrix is a matrix which comes by permuting the rows of the identity matrix, not 
just switching two rows. 
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Now consider what these elementary matrices look like. First consider the one which involves 
switching row i and row j where i < j. This matrix is of the form 



/ l ° 

o '•• 

••• 



°\ 
















1 






1 















1 








1 







\o 



'•• 

1 / 



The two exceptional rows are shown. The i th row was the j th and the j th row was the i th in the 
identity matrix. Now consider what this does to a column vector. 



f 1 

















... o 


























1 







1 








1 ••• 




... o 













1 








1 


••• 
••• 


... o 
... o 



\o 



"•• 

1/ 



/ vi \ 



I Vl \ 



V v n J 



\ v n ) 



Now denote by P ZJ the elementary matrix which comes from the identity from switching rows i and 
j. From what was just explained consider multiplication on the left by this elementary matrix. 



/ an ai2 



Q>il a i2 



CLji CLj2 



\ Q>nl a n2 



dip \ 



dir> 



CL n p J 
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From the way you multiply matrices this is a matrix which has the indicated columns. 
/ / an \ / ai2 \ / ai P \ \ 



pw 



an 



CLj! 



pi 3 

i ± 



di2 



a j2 



5 5 



CLj! 



an 



a j2 



a%2 



pw 



M 3P 



\ \ CLnl J V a n2 ) \ CL np j j 

( ( 0>11 \ ( 0*12 \ ( dip \ \ 



M 3V 



a %r\ 



\\CLniJ \ a n2 J \ a np j j 




PIAY. 
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% 1 
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/ aii 



ai2 



flip \ 



a ji a i2 



&;i a^2 



*jp 



\ CLnl a n2 

This has established the following lemma. 



^np 



Lemma 8.1.3 Let P % i denote the elementary matrix which involves switching the i th and the j th 
rows. Then 

P ij A = B 

where B is obtained from A by switching the i th and the j th rows. 

Next consider the row operation which involves multiplying the i th row by a nonzero constant, 
c. The elementary matrix which results from applying this operation to the i th row of the identity 
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matrix is of the form 



/i o 0\ 

'•• : 

: 1 : 



\o 

Now consider what this does to a column vector. 

/I o 0\ 

o "•• 

: 1 



V 



o 1 / 



'•. 
o 1 / 



/ vi \ 

Vi-l 
Vi+1 

V V n J 



I »1 \ 

Vi-l 

cv % 

Vi+1 
V Vn J 



Denote by E (c,i) this elementary matrix which multiplies the i th row of the identity by the nonzero 
constant, c. Then from what was just discussed and the way matrices are multiplied, 



E(c,i) 



( an ai2 



an a i2 



<2j2 aj2 



Q>ip \ 



\ Q>nl a n2 

equals a matrix having the columns indicated below. 

/ / Oil \ / «12 \ 



CL n p J 



E(c,i) 



an 



an 



,E(c,i) 



\ \ a n i ) 

( an <2i2 

can ca i2 

aj2 aj2 

V a n i a n 2 



a % 2 



a j2 



\ a n2 j 



( flip \\ 



,E(c,i) 



\ a np J J 



ai p \ 



ca^ 



a n p j 
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This proves the following lemma. 

Lemma 8.1.4 Let E (c, i) denote the elementary matrix corresponding to the row operation in which 
the V th row is multiplied by the nonzero constant, c. Thus E (c, i) involves multiplying the i th row of 
the identity matrix by c. Then 

E(c,i)A = B 

where B is obtained from A by multiplying the i th row of A by c. 

Finally consider the third of these row operations. Denote by E (c x i + j) the elementary matrix 
which replaces the j th row with itself added to c times the i th row added to it. In case i < j this 
will be of the form 

/ 1 \ 

'•• : 



\o 

Now consider what this does to a column vector. 

( 1 o \ 

o "•• 



'•. 
o 1 / 



\o 



"•• 
o 1 / 



/M 



/ vi \ 



\Vn J 



Now from this and the way matrices are multiplied, 



E (c x i + j) 



CVi + Vj 

V Vn J 



an cti2 



a«2 aj2 



\ Q>nl a n2 0, np J 

equals a matrix of the following form having the indicated columns. 

/ / an \ / ai2 \ / a>i P \\ 



E (c x i + j) 



an 



a j2 



\ a nl j 



,E(cxi+j) 



ai2 



a j2 



\ a n2 j 



,-'E(cxi + j) 



*>ip 



V a np J j 
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/ aii 



an 



A12 



^2 



Alp \ 



ctj2 + ca^i a j2 + ca i2 



\ a n i 



a n 2 



^JP i C&ip 



A, n p 



The case where z > j is handled similarly. This proves the following lemma. 

Lemma 8.1.5 Let E (c x i + j) denote the elementary matrix obtained from I by replacing the j th 
row with c times the i th row added to it. Then 

E(cxi+j)A = B 

where B is obtained from A by replacing the j th row of A with itself added to c times the i th row of 
A. 

The next theorem is the main result. 

Theorem 8.1.6 To perform any of the three row operations on a matrix A it suffices to do the row 
operation on the identity matrix obtaining an elementary matrix E and then take the product, EA. 
Furthermore, each elementary matrix is invertible and its inverse is an elementary matrix. 
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Proof: The first part of this theorem has been proved in Lemmas 8.1.3 - 8.1.5. It only remains 
to verify the claim about the inverses. Consider first the elementary matrices corresponding to row 
operation of type three. 

E (-c x i + j) E (c x i + j) = I 

This follows because the first matrix takes c times row i in the identity and adds it to row j. When 
multiplied on the left by E (— c x i + j) it follows from the first part of this theorem that you take 
the i th row of E (c x i + j) which coincides with the i th row of I since that row was not changed, 
multiply it by — c and add to the j th row of E (c x i + j) which was the j th row of / added to c 
times the i th row of /. Thus E (— c x i + j) multiplied on the left, undoes the row operation which 
resulted in E (c x i + j). The same argument applied to the product 

E (c x i + j) E (— c xi+j) 

replacing c with — c in the argument yields that this product is also equal to /. Therefore, E (c x i + j)~ 
E (— c xi + j). 
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Similar reasoning shows that for E (c, i) the elementary matrix which comes from multiplying 
the i th row by the nonzero constant, c, 

E^iy 1 =E(c~ 1 ,i) . 

Finally, consider P ZJ which involves switching the i th and the j th rows. 

pij pi 3 — j 

because by the first part of this theorem, multiplying on the left by P ZJ switches the i th and j th rows 
of P l i which was obtained from switching the i th and j th rows of the identity. First you switch them 
to get P* J and then you multiply on the left by P* J which switches these rows again and restores 
the identity matrix. Thus (P^)" 1 = P ij . ■ 

8.2 THE Row Reduced Echelon Form Of A Matrix 

Recall that putting a matrix in row reduced echelon form involves doing row operations as described 
on Page 83. In this section we review the description of the row reduced echelon form and prove the 
row reduced echelon form for a given matrix is unique. That is, every matrix can be row reduced to 
a unique row reduced echelon form. Of course this is not true of the echelon form. The significance 
of this is that it becomes possible to use the definite article in referring to the row reduced echelon 
form and hence important conclusions about the original matrix may be logically deduced from an 
examination of its unique row reduced echelon form. First we need the following definition of some 
terminology. 

Definition 8.2.1 Let vi, • • • , V&, u be vectors. Then u is said to be a linear combination of the 

vectors {vi, • • • , v/e} if there exist scalars, ci, • • • , C& such that 

k 

2=1 

The collection of all linear combinations of the vectors, {vi, • • • , v^} is known as the span of these 
vectors and is written as span (vi, • • • , v^). 

Another way to say the same thing as expressed in the earlier definition of row reduced echelon 
form found on Page 80 is the following which is a more useful description when proving the major 
assertions about the row reduced echelon form. 

Definition 8.2.2 Let e^ denote the column vector which has all zero entries except for the i th slot 
which is one. Anmxn matrix is said to be in row reduced echelon form if in viewing successive 
columns from left to right, the first nonzero column encountered is ei and if you have encountered 
ei, e2, • • • , e/e, the next column is either e/c+i or is a linear combination of the vectors, ei, e2, • • • , e^. 

Theorem 8.2.3 Let A be an m x n matrix. Then A has a row reduced echelon form determined by 
a simple process. 

Proof: Viewing the columns of A from left to right take the first nonzero column. Pick a nonzero 
entry in this column and switch the row containing this entry with the top row of A. Now divide 
this new top row by the value of this nonzero entry to get a 1 in this position and then use row 
operations to make all entries below this equal to zero. Thus the first nonzero column is now ei. 
Denote the resulting matrix by A\. Consider the sub- matrix of A\ to the right of this column and 
below the first row. Do exactly the same thing for this sub-matrix that was done for A. This time 
the ei will refer to F m_1 . Use the first 1 obtained by the above process which is in the top row of 
this sub-matrix and row operations to zero out every entry above it in the rows of A\. Call the 
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resulting matrix A^. Thus A^ satisfies the conditions of the above definition up to the column just 
encountered. Continue this way till every column has been dealt with and the result must be in row 
reduced echelon form. ■ 

The following diagram illustrates the above procedure. Say the matrix looked something like the 
following. 

/o******\ 

***** * 



\o 






First step would yield something like 



/01*****\ 
* * * * * 



\oo*****y 

For the second step you look at the lower right corner as described, 

/***** 



V 






and if the first column consists of all zeros but the next one is not all zeros, you would get something 

like this. 

1 * * > 



* * > 

Thus, after zeroing out the term in the top row above the 1, you get the following for the next step 
in the computation of the row reduced echelon form for the original matrix. 

/01*0***\ 
1 * * 



\ 



/ 



Next you look at the lower right matrix below the top two rows and to the right of the first four 
columns and repeat the process. 

Recall the following definition which was discussed earlier. 

Definition 8.2.4 The first pivot column of A is the first nonzero column of A. The next pivot 
column is the first column after this which becomes e 2 in the row reduced echelon form. The third is 
the next column which becomes e% in the row reduced echelon form and so forth. 

There are three choices for row operations at each step in the above theorem. A natural question 
is whether the same row reduced echelon matrix always results in the end from following the above 
algorithm applied in any way. The next corollary says this is the case but first, here is a fundamental 
lemma. 

In rough terms, the following lemma states that linear relationships between columns in a 
matrix are preserved by row operations. This simple lemma is the main result in understanding all 
the major questions related to the row reduced echelon form as well as many other topics. 
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Lemma 8.2.5 Let A and B be two m x n matrices and suppose B results from a row operation 
applied to A. Then the k th column of B is a linear combination of the zi, • • • , i r columns of B if and 
only if the k th column of A is a linear combination of the zi, • • • , i r columns of A. Furthermore, the 
scalars in the linear combination are the same. (The linear relationship between the k th column of 
A and the zi, • • • , i r columns of A is the same as the linear relationship between the k th column of 
B and the zi, • • • , i r columns of B.) 

Proof: Let A equal the following matrix in which the a& are the columns 

( ai a 2 • • • a n ) 

and let B equal the following matrix in which the columns are given by the b^ 

( bi b 2 • • • b n ) 

Then by Theorem 8.1.6 on Page 171 b^ = Ea^ where E is an elementary matrix. Suppose then 
that one of the columns of A is a linear combination of some other columns of A. Say 



a& — y c r a r . 



res 



Then multiplying by E, 



bfc = E&k = 2_. CrE&r = 2_. c r h r . 
res res 



Definition 8.2.6 Two matrices are said to be row equivalent if one can be obtained from the 
other by a sequence of row operations. 

It has been shown above that every matrix is row equivalent to one which is in row reduced 
echelon form. Note 

Xi 

: x 1 e 1 H Vx n e ri 

so to say two column vectors are equal is to say they are the same linear combination of the special 
vectors e^. 

Corollary 8.2.7 The row reduced echelon form is unique. That is if B,C are two matrices in row 
reduced echelon form and both are row equivalent to A, then B = C. 

Proof: Suppose B and C are both row reduced echelon forms for the matrix A. Then they 
clearly have the same zero columns since row operations leave zero columns unchanged. If B has 
the sequence ei, e 2 , • • • , e r occurring for the first time in the positions, z'i, z 2 , • • • , z r , the description 
of the row reduced echelon form means that each of these columns is not a linear combination of 
the preceding columns. Therefore, by Lemma 8.2.5, the same is true of the columns in positions 
zi, z 2 , • • • , i r for C . It follows from the description of the row reduced echelon form that ei, • • • , e r 
occur respectively for the first time in columns z'i, z 2 , • • • , i r for C. Therefore, both B and C have the 
sequence ei, e 2 , • • • , e r occurring for the first time in the positions, z'i, z 2 , • • • , i r . By Lemma 8.2.5, 
the columns between the z& and z&+i position in the two matrices are linear combinations involving 
the same scalars of the columns in the zi, • • • , z& position. Also the columns after the i r position 
are linear combinations of the columns in the zi, • • • , i r positions involving the same scalars in both 
matrices. This is equivalent to the assertion that each of these columns is identical and this proves 
the corollary. ■ 
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The above corollary shows that you can determine whether two matrices are row equivalent by 
simply checking their row reduced echelon forms. The matrices are row equivalent if and only if 
they have the same row reduced echelon form. 

Now with the above corollary, here is a very fundamental observation. It concerns a matrix which 
looks like this: (More columns than rows.) 



Corollary 8.2.8 Suppose A is anmxn matrix and that m < n. That is, the number of rows is less 
than the number of columns. Then one of the columns of A is a linear combination of the preceding 
columns of A. Also, there exists a nonzero solution x to the equation Ax = 0. 

Proof: Since m < n, not all the columns of A can be pivot columns. In reading from left to 
right, pick the first one which is not a pivot column. Then from the description of the row reduced 
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echelon form, this column is a linear combination of the preceding columns. Denote the j th column 
of A by a.j . Thus for some k > 1 , 



fc-i fc-i 

i=i i=i 



Let x = (xi, • • • , x/e_i, —1, 0, • • • , 0) . Then Ax = 0. ■ 

Example 8.2.9 Find the row reduced echelon form of the matrix 

2 3 
2 1 
115 

The first nonzero column is the second in the matrix. We switch the third and first rows to 
obtain 

115 
2 1 
2 3 

Now we multiply the top row by —2 and add to the second. 

11 5 
0-2-9 
2 3 

Next, add the second row to the bottom and then divide the bottom row by —6 

11 5 
0-2-9 

1 

Next use the bottom row to obtain zeros in the last column above the 1 and divide the second row 
by -2 

110 

10 

1 

Finally, add —1 times the middle row to the top. 

10 

10 

x 1 

This is in row reduced echelon form. 

Example 8.2.10 Find the row reduced echelon form for the matrix 

12 2 

-13 4 3 

5 4 5 

► ► 

You should verify that the row reduced echelon form is 

1 -§ 
1 | 1 


Having developed the row reduced echelon form, it is now easy to verify that the right inverse 
found earlier using the Gauss Jordan procedure is the inverse. 
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Theorem 8.2.11 Suppose A, B are n x n matrices and AB = I. Then it follows that BA = I also, 
and so B = A~ x . For n x n matrices, the left inverse, right inverse and inverse are all the same 
thing. 

Proof. If AB = I for A, B n x n matrices, is BA = II If AB = /, there exists a unique solution 
x to the equation 

£x = y 

for any choice of y. In fact, 

x = A (Sx) = Ay. 

This means the row reduced echelon form of B must be /. Thus every column is a pivot column. 
Otherwise, there exists a free variable and the solution, if it exists, would not be unique, contrary 
to what was just shown must happen if AB = I. It follows that a right inverse B~ x for B exists. 
The Gauss Jordan procedure for finding the inverse yields 

( B I ) .-► ( I B- 1 ) . 

Now multiply both sides of the equation AB = / on the right by B~ x . Then 

A = A (BB- 1 ) = (AB) B' 1 = B~\ 

Thus A is the right inverse of £?, and so BA = I. This shows that if AB = /, then BA = I also. 
Exchanging roles of A and B, we see that if BA = /, then AB = I. ■ 

8.3 The Rank Of A Matrix 

8.3.1 The Definition Of Rank 

To begin, here is a definition to introduce some terminology. 

Definition 8.3.1 Let A be an m x n matrix. The column space of A is the span of the columns. 
The row space is the span of the rows. 

There are three definitions of the rank of a matrix which are useful. These are given in the 
following definition. It turns out that the concept of determinant rank is often important but is 
virtually impossible to find directly. The other two concepts of rank are very easily determined and 
it is a happy fact that all three yield the same number. This is shown later. 

Definition 8.3.2 A sub-matrix of a matrix A is a rectangular array of numbers obtained by 
deleting some rows and columns of A. Let A be an m x n matrix. The determinant rank of the 
matrix equals r where r is the largest number such that some r x r sub-matrix of A has a non zero 
determinant. The row space of a matrix is the span of the rows and the column space of a matrix 
is the span of the columns. The row rank of a matrix is the number of nonzero rows in the row 
reduced echelon form and the column rank is the number columns in the row reduced echelon form 
which are one of the e& vectors. Thus the column rank equals the number of pivot columns. It follows 
the row rank equals the column rank. This is also called the rank of the matrix. The rank of a matrix 
A is denoted by rank (A) . 



Example 8.3.3 Consider the matrix 



What is its rank? 



1 2 3 

2 4 6 
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You could look at all the 2x2 submatrices 



Each has determinant equal to 0. Therefore, the rank is less than 2. Now look at the lxl subma- 
trices. There exists one of these which has nonzero determinant. For example (1) has determinant 
equal to 1 and so the rank of this matrix equals 1. 

Of course this example was pretty easy but what if you had a 4 x 7 matrix? You would have to 
consider all the 4x4 submatrices and then all the 3x3 submatrices and then all the 2x2 matrices 
and finally all the lxl matrices in order to compute the rank. Clearly this is not practical. The 
following theorem will remove the difficulties just indicated. 

The following theorem is proved later. 

Theorem 8.3.4 Let A be an m x n matrix. Then the row rank, column rank and determinant rank 
are all the same. 



Example 8.3.5 Find the rank of the matrix 

( 1 



V 



2 1 3 \ 

3 2 12 
2 16 5 

-3-217/ 



From the above definition, all you have to do is find the row reduced echelon form and then 
count up the number of nonzero rows. But the row reduced echelon form of this matrix is 

/ 1 o o - x i \ 



10 
10 
\ 1 



45 
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and so the rank of this matrix is 4. 
Find the rank of the matrix 



( 1 2 1 3 \ 

-43212 

3 2 16 5 

V 7 4 10 7 J 



The row reduced echelon form is 



(I 








3 
2 


5 
2 





1 





-4 


-17 








1 


19 
2 


63 
2 


\o 















and so this time the rank is 3. 

8.3.2 Finding The Row And Column Space Of A Matrix 

The row reduced echelon form also can be used to obtain an efficient description of the row and 
column space of a matrix. Of course you can get the column space by simply saying that it equals 
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the span of all the columns but often you can get the column space as the span of fewer columns than 
this. This is what we mean by an "efficient description" . This is illustrated in the next example. 

Example 8.3.6 Find the rank of the following matrix and describe the column and row spaces 
efficiently. 

12 13 2 

13 6 2] (8.1) 
v 3 7 8 6 6 

The row reduced echelon form is 




5 


Therefore, the rank of this matrix equals 2. All columns of this row reduced echelon form are in 




For example, 



span 





By Lemma 8.2.5, all columns of the original matrix, are similarly contained in the span of the first 
two columns of that matrix. For example, consider the third column of the original matrix. 





How did I know to use —9 and 5 for the coefficients? This is what Lemma 8.2.5 says! It says linear 
relationships are all preserved. Therefore, the column space of the original matrix equals the span 
of the first two columns. This is the desired efficient description of the column space. 

What about an efficient description of the row space? When row operations are used, the resulting 
vectors remain in the row space. Thus the rows in the row reduced echelon form are in the row space 
of the original matrix. Furthermore, by reversing the row operations, each row of the original matrix 
can be obtained as a linear combination of the rows in the row reduced echelon form. It follows that 
the span of the nonzero rows in the row reduced echelon matrix equals the span of the original rows. 
In the above example, the row space equals the span of the two vectors, (l —9 9 2) and 
(015-30). 

Example 8.3.7 Find the rank of the following matrix and describe the column and row spaces 



efficiently. 



/ 1 2 1 3 2 \ 

13 6 2 

12 13 2 

\ 1 3 2 4 ) 



*-2) 



The row reduced echelon form is 



( 1 














1 





2 








1 


-: 


\* 












13 
_ 2 5 

I 5 

2 
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and so the rank is 3, the row space is the span of the vectors, 

(0 1 -1 \ ),(0 1 2 -| ), 
( 1 f ), 

and the column space is the span of the first three columns in the original matrix, 



(( l \ 



span 



WW 



/ 2 \ 

3 
2 

V 3 y 



f 1 \\ 

6 

1 

\2/y 



Example 8.3.8 Find the rank of the following matrix and describe the column and row spaces 
efficiently. 

3 

3 

1 



-1 




2 
3 



The row reduced echelon form is 



1 





1 








1 


1 














1 



21 
17 2 



It follows the rank is three and the column space is the span of the first, second and fourth columns 
of the original matrix. 



span 




while the row space is the span of the vectors 

( 1 ±f ),( 1 1 







4 ),( 1 1 § ). 



Procedure 8.3.9 To find the rank of a matrix, obtain the row reduced echelon form for the matrix. 
Then count the number of nonzero rows or equivalently the number of pivot columns. This is the 
rank. The row space is the span of the nonzero rows in the row reduced echelon form and the column 
space is the span of the pivot columns of the original matrix. 

8.4 Linear Independence And Bases 

8.4.1 Linear Independence And Dependence 

First we consider the concept of linear independence. We define what it means for vectors in F n 
to be linearly independent and then give equivalent descriptions. In the following definition, the 
symbol, 

( vi v 2 ••• v fe ) 

denotes the matrix which has the vector vi as the first column, v 2 as the second column and so 
forth until v^ is the k th column. 

Definition 8.4.1 Let {yi,--- , v&} be vectors in ¥ n . Then this collection of vectors is said to be 
linearly independent if each of the columns of the n x k matrix 

( Vi v 2 ••• v fe ) 

is a pivot column. Thus the row reduced echelon form for this matrix is 

( ei e 2 • • • e k ) . 
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The question whether any vector in the first k columns in a matrix is a pivot column is indepen- 
dent of the presence of later columns. Thus each of {vi, • • • , v^} is a pivot column in 

( vi v 2 ••• v fe ) 

if and only if these vectors are each pivot columns in 

( vi v 2 ••• v fe wi ••• w r ) 

Here is what the linear independence means in terms of linear relationships. 

Corollary 8.4.2 The collection of vectors, {vi, • • • , v^} is linearly independent if and only if none 
of these vectors is a linear combination of the others. 

Proof: If {vi, • • • , Vfc} is linearly independent, then every column in 

( Vi V 2 ••• Vfc ) 

is a pivot column which requires that the row reduced echelon form is 

( ei e 2 • • • e*. ) . 

Now none of the e^ vectors is a linear combination of the others. By Lemma 8.2.5 on Page 175 none 
of the v^ is a linear combination of the others. Recall this lemma says linear relationships between 
the columns are preserved under row operations. 

Next suppose none of the vectors {vi, • • • , v^} is a linear combination of the others. Then none 
of the columns in 

( vi v 2 ••• v fe ) 

is a linear combination of the others. By Lemma 8.2.5 the same is true of the row reduced ech- 
elon form for this matrix. From the description of the row reduced echelon form, it follows that 
the i th column of the row reduced echelon form must be e^ since otherwise, it would be a linear 
combination of the first i — 1 vectors ei,- • • ,e i _ 1 and by Lemma 8.2.5, it follows v^ would be the 
same linear combination of Vi, • • • , v^_i contrary to the assumption that none of the columns in 
( vi v 2 • • • Vfc ) is a linear combination of the others. Therefore, each of the k columns in 
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( vi V2 • • • Vfc ) is a pivot column and so {vi, • • • , v^} is linearly independent. ■ 
Corollary 8.4.3 The collection of vectors, {vi, • • • , v^} is linearly independent if and only if when- 



ever 



X>v, = o 



it follows each q = 0. 



Proof: Suppose first {vi, • • • , v&} is linearly independent. Then by Corollary 8.4.2, none of the 
vectors is a linear combination of the others. Now suppose 

n 

y^ Ci^i = o 

and not all the q = 0. Then pick q which is not zero, divide by it and solve for v^ in terms of the 
other Vj, contradicting the fact that none of the v^ equals a linear combination of the others. 
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Now suppose the condition about the sum holds. If v^ is a linear combination of the other vectors 
in the list, then you could obtain an equation of the form 

and so 

0=^2cjVj + (-l)vi, 

contradicting the condition about the sum. ■ 

Sometimes we refer to this last condition about sums as follows: The set of vectors, {vi, • • • , v^} 
is linearly independent if and only if there is no nontrivial linear combination which equals zero. (A 
nontrivial linear combination is one in which not all the scalars equal zero.) 

We give the following equivalent definition of linear independence which follows from the above 
corollaries. 

Definition 8.4.4 A set of vectors, {vi, • • • , v^} is linearly independent if and only if none of the 
vectors is a linear combination of the others or equivalently if there is no nontrivial linear combination 
of the vectors which equals 0. It is said to be linearly dependent if at least one of the vectors is 
a linear combination of the others or equivalently there exists a nontrivial linear combination which 
equals zero. 

Note the meaning of the words. To say a set of vectors is linearly dependent means at least one 
is a linear combination of the others. In other words, it is in a sense "dependent" on these other 
vectors. 

The following corollary follows right away from the row reduced echelon form. It concerns a 
matrix which looks like this: (More columns than rows.) 



Corollary 8.4.5 Let {vi, • • • , v^} be a set of vectors in ¥ n . Then if k > n, it must be the case that 
{ v i? • • • 5 v fe} is not linearly independent. In other words, if k > n, then {vi, • • • , v^} is dependent. 

Proof: If k > n, then the columns of ( vi V2 • • • v^ ) cannot each be a pivot column 
because there are at most n pivot columns due to the fact the matrix has only n rows. In reading 
from left to right, pick the first column which is not a pivot column. Then from the description of 
row reduced echelon form, this column is a linear combination of the preceding columns and so the 
given vectors are dependent by Corollary 8.4.2. ■ 



Example 8.4.6 Determine whether the vectors < 



f/l\/2\/0\/ 3 \ 

2 112 

3 12 

ivo/Vi/Wv- 1 / 



are linearly 



independent. If they are linearly dependent, exhibit one of the vectors as a linear combination of the 
others. 



Form the matrix mentioned above. 



/I 


2 





3 \ 


2 


1 


1 


2 


3 





1 


2 


\o 


1 


2 


-W 



Download free eBooks at bookboon.com 



185 



Elementary Linear Algebra 



Rank Of A Matrix 



Then the row reduced echelon form of this matrix is 

/ 1 1 \ 

10 1 

1-1 

V / 

Thus not all the columns are pivot columns and so the vectors are not linear independent. Note the 
fourth column is of the form 



( 1 \ 




(°\ 




f°\ 







1 










+ 1 





+ (-1) 


1 


\o) 




\o) 




V o / 



From Lemma 8.2.5, the same linear relationship exists between the columns of the original matrix. 
Thus 

/ 1 \ /2\ /0\ / 3 \ 

3 +1 + ^ 1 = 2 

V o / V 1 / V 2 / V- 1 / 

Note the usefulness of the row reduced echelon form in discovering hidden linear relationships in 
collections of vectors. 



1 



f/M / 2 \ 



Example 8.4.7 Determine whether the vectors < 



2 
3 



/o\ 



V i / V 2 y 



/ 3 \) 

2 
2 

Vo/J 



> are linearly in- 



. If they are linearly dependent, exhibit one of the vectors as a linear combination of the 



others. 

The matrix used to find this is 



/ 1 2 3 \ 

2 112 

3 12 

\ 1 2 / 



The row reduced echelon form is 



/ 1 \ 

10 

10 

\ 1 / 

and so every column is a pivot column. Therefore, these vectors are linearly independent and there 
is no way to obtain one of the vectors as a linear combination of the others. 



8.4.2 Subspaces 

A subspace is a set of vectors with the property that linear combinations of these vectors remain in 
the set. Geometrically, subspaces are like lines and planes which contain the origin. More precisely, 
the following definition is the right way to think of this. 

Definition 8.4.8 Let V be a nonempty collection of vectors in ¥ n . Then V is called a subspace if 
whenever a, /3 are scalars and u, v are vectors in V, the linear combination au + /3v is also in V. 

It turns out that every subspace equals the span of some vectors. This is the content of the next 
theorem. 
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Theorem 8.4.9 V is a subspace of¥ n if and only if there exist vectors ofV 

{ui,--- ,u fe } 

such that V = span (ui, • • • , Ufc) . 

Proof: Pick a vector of V, ui. If V = span {ui} , then stop. You have found your list of vectors. 
If V 7^ span(ui) , then there exists 112 a vector of V which is not a vector in span(ui) . Consider 
span (111,112) • If V — span (111,112) , stop. Otherwise, pick 113 ^ span (111,112) . Continue this way. 
Note that since V is a subspace, these spans are each contained in V. The process must stop with 
u/e for some k < n since otherwise, the matrix 

( Ui ••• Ufc ) 

having these vectors as columns would have n rows and k > n columns. Consequently, it can have 
no more than n pivot columns and so the first column which is not a pivot column would be a linear 
combination of the preceding columns contrary to the construction. 

For the other half, suppose V = span(ui,--- , Ufc) and let ^2 i=1 Qu^ and ^2 i=1 diUi be two 
vectors in V. Now let a and f3 be two scalars. Then 

k k k 



i 2_] c i u i + P /J diUi = 2_. ( ac i + Pdi) u i 



a 

i=l i=l i=l 

which is one of the things in span (ui, • • • , u&) showing that span (ui, • • • , Ufc) has the properties of 
a subspace. ■ 

The following corollary also follows easily. 

Corollary 8.4.10 If V is a subspace o/F n , then there exist vectors of V, {ui, • • • , u&} such that 
V = span (ui, • • • , Ufc) and {ui, • • • , u^} is linearly independent. 

Proof: Let V = span (ui, • • • , Ufc) . Then let the vectors {ui, • • • , u^} be the columns of the 
following matrix. 

( Ui ••• Ufc ) 

Retain only the pivot columns. That is, determine the pivot columns from the row reduced echelon 
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form and these are a basis for span (ui, • • • , u^). ■ 

The message is that subspaces of F n consist of spans of finite, linearly independent collections of 
vectors of F n . 

The following fundamental lemma is very useful. 

Lemma 8.4.11 Suppose {xi, • • • , x r } is linearly independent and each x^ is contained in span (yi, • • • , y s ) . 
Then s > r. In words, spanning sets have at least as many vectors as linearly independent sets. 

Proof: Since {yi, • • • ,y s } is a spanning set, there exist scalars a^ such that 

s 
X j = / j a ijyi 

Suppose s < r. Then the matrix A whose ij th entry is a^ has fewer rows, s than columns, r. By 
Corollary 8.2.8 there exists d such that d/0 but Ad = 0. In other words, 

r 

V^ aijdj =0, i = 1, 2, • • • , s 
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Therefore, 



r s 



^2 d ^j = ^2 d J^2 a ijYi 

3=1 3=1 i=l 



s I r 



1=1 \j=l J i=l 

which contradicts {xi, • • • ,x r } is linearly independent, because not all the dj = 0. Thus s > r. ■ 
Note how this lemma was totally dependent on algebraic considerations and was independent of 
context. This will be considered more later in the chapter on abstract vector spaces. I didn't need 
to know what the x&, y& were, only that the {xi, • • • ,x r } were independent and contained in the 
span of the y& . 

8.4.3 Basis Of A Subspace 

It was just shown in Corollary 8.4.10 that every subspace of F n is equal to the span of a linearly 
independent collection of vectors of F n . Such a collection of vectors is called a basis. 

Definition 8.4.12 Let V be a subspace of¥ n . Then {ui, • • • , u k } is a basis for V if the following 
two conditions hold. 

1. span(ui,--- ,Ufc) = V. 

2. {ui, • • • , u/e} is linearly independent. 
The plural of basis is bases. 

The main theorem about bases is the following. 

Theorem 8.4.13 Let V be a subspace of¥ n and suppose {ui, • • • , u^}, {vi, • • • , v m } are two bases 
for V . Then k = m. 

Proof: This follows right away from Lemma 8.4.11. {ui,--- , u/c} is a spanning set while 
{vi, • • • , v m } is linearly independent so k > m. Also {vi, • • • , v m } is a spanning set while {ui, • • • , u^} 
is linearly independent so m > k. 

Now here is another proof. Suppose k < m. Then since {ui, • • • , u^} is a basis for V, each v^ is 
a linear combination of the vectors of {ui, • • • , u^} . Consider the matrix 

( ui ••• u k vi ••• v m ) 

in which each of the u^ is a pivot column because the {ui, • • • , u^} are linearly independent. There- 
fore, the row reduced echelon form of this matrix is 

( ei ••• e k wi ••• w m ) (8.3) 

where each Wj has zeroes below the k th row. This is because of Lemma 8.2.5 which implies each w^ 
is a linear combination of the ei, • • • , e&. Discarding the bottom n — k rows of zeroes in the above, 
yields the matrix 

(e' x ••• e' k wi ••• w^ ) 

in which all vectors are in ¥ k . Since ra > fc, it follows from Corollary 8.4.5 that the vectors, 
{w^, • • • , w^} are dependent. Therefore, some w' is a linear combination of the other w^. There- 
fore, Wj is a linear combination of the other Wj in 8.3. By Lemma 8.2.5 again, the same linear 
relationship exists between the {vi, • • • , v m } showing that {vi, • • • , v m } is not linearly independent 
and contradicting the assumption that {vi, • • • , v m } is a basis. It follows m < k. Similarly, k <m. 
■ 

This is a very important theorem so here is yet another proof of it. 
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Theorem 8.4.14 Let V be a subspace and suppose {ui, • • • , u^} and {vi, • • • , v m } are two bases 
for V . Then k = m. 

Proof: Suppose k > m. Then since the vectors, {ui,--- , u&} span V, there exist scalars, Cij 
such that 



Therefore, 



if and only if 



£< 



k km 

y djUj = if and only if 2_, /, c ijdj v i — 

j=l j=l i=l 



m I k \ 

Now since{vi, • • • , v n } is independent, this happens if and only if 

k 

Y^ Cijdj =0, i = 1, 2, • • • , m. 
j=i 

However, this is a system of m equations in k variables, di, • • • , d& and m < k. Therefore, there 
exists a solution to this system of equations in which not all the dj are equal to zero. Recall why 
this is so. The augmented matrix for the system is of the form ( C ) where C is a matrix which 
has more columns than rows. Therefore, there are free variables and hence nonzero solutions to the 
system of equations. However, this contradicts the linear independence of {ui, • • • , u&} because, as 
explained above, X^=i ^j u j = 0. Similarly it cannot happen that m > k. ■ 
The following definition can now be stated. 

Definition 8.4.15 Let V be a subspace of ¥ n . Then the dimension of V is defined to be the 
number of vectors in a basis. 

Corollary 8.4.16 The dimension of¥ n is n. 

Proof: You only need to exhibit a basis for F n which has n vectors. Such a basis is {ei, • • • , e n }. 
■ 

Corollary 8.4.17 Suppose {vi, • • • , v n } is linearly independent and each v^ is a vector in ¥ n . Then 
{vi, • • • , v n } is a basis for ¥ n . Suppose {vi, • • • , v m } spans ¥ n . Then m > n. If {vi, • • • , v n } spans 
F n , then {vi, • • • , v n } is linearly independent. 

Proof: Let u be a vector of F n and consider the matrix 

( vi • • • v n u ) . 

Since each v^ is a pivot column, the row reduced echelon form is 

( ei ••• e n w ) 

and so, since w is in span (ei, • • • , e n ) , it follows from Lemma 8.2.5 that u is one of the vectors in 
span (vi, • • • , v n ) . Therefore, {vi, • • • , v n } is a basis as claimed. 

To establish the second claim, suppose that m < n. Then letting v^ , • • • , v^ fc be the pivot columns 
of the matrix 

( vi ••• v m ) 
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it follows k < m < n and these k pivot columns would be a basis for F n having fewer than n vectors, 
contrary to Theorem 8.4.13 which states every two bases have the same number of vectors in them. 
Finally consider the third claim. If {vi, • • • , v n } is not linearly independent, then replace this 
list with {v^, • • • , Vi k } where these are the pivot columns of the matrix 

( vi ••• v n ) 

Then {v^ 15 • • • , v ik } spans F n and is linearly independent so it is a basis having less than n vectors 
contrary to Theorem 8.4.13 which states every two bases have the same number of vectors in them. 



Example 8.4.18 Find the rank of the following matrix. If the rank is r, identify r columns in 
the original matrix which have the property that every other column may be written as a linear 
combination of these. Also find a basis for the row and column spaces of the matrices. 



The row reduced echelon form is 



L 


2 


3 2 


L 


5 


-4 -1 


2 


3 


1 


1 





1 



2Z \ 

70 

- 1 - 7H / 



and so the rank of the matrix is 3. A basis for the column space is the first three columns of the 
original matrix. I know they span because the first three columns of the row reduced echelon 
form above span the column space of that matrix. They are linearly independent because the first 
three columns of the row reduced echelon form are linearly independent. By Lemma 8.2.5 all linear 
relationships are preserved and so these first three vectors form a basis for the column space. The 
four rows of the row reduced echelon form form a basis for the row space of the original matrix. 

Example 8.4.19 Find the rank of the following matrix. If the rank is r, identify r columns in 
the original matrix which have the property that every other column may be written as a linear 
combination of these. Also find a basis for the row and column spaces of the matrices. 

12 3 1 

1 12-62 

-2 3 1 2 

The row reduced echelon form is 



1 





1 





1 

A 7 





1 


1 





4 
7 











1 


11 
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A basis for the column space of this row reduced echelon form is the first second and fourth 
columns. Therefore, a basis for the column space in the original matrix is the first second and 
fourth columns. The rank of the matrix is 3. A basis for the row space of the original matrix is the 
columns of the row reduced echelon form. 

8.4.4 Extending An Independent Set To Form A Basis 

Suppose {vi, • • • , v m } is a linearly independent set of vectors in F n . It turns out there is a larger 
set of vectors, {vi, • • • , v m , v m +i, • • • , v n } which is a basis for F n . It is easy to do this using the 
row reduced echelon form. Consider the following matrix having rank n in which the columns are 
shown. 

( vi • • • v m ei e 2 • • • e n ) . 

Since the {vi, • • • , v m } are linearly independent, the row reduced echelon form of this matrix is of 
the form 

( ei ••• e m ui u 2 ••• u n ) 
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Now the pivot columns can be identified and this leads to a basis for the column space of the original 
matrix which is of the form 

This proves the following theorem. 

Theorem 8.4.20 Let {vi, • • • , v m } be a linearly independent set of vectors in F n . Then there is a 
larger set of vectors, {vi, • • • , v m , v m+ i, • • • , v n } which is a basis for ¥ n . 



r / 1 \ 



Example 8.4.21 The vectors, < 



/ 1 M 



IV o / 



V o / J 



> are linearly independent. Enlarge this set of 



vectors to form a basis for R 4 . 

Using the above technique, consider the following matrix. 

/ 1 1 1 \ 

10 10 

10 10 

\ 1 / 

whose row reduced echelon form is 



( 1 1 \ 

10 10 

1-1-10 

\ 1/ 

The pivot columns are numbers 1,2,3, and 6. Therefore, a basis is 



if 1 ) 

1 

< 


5 


f 1 ) 



1 

v o y 


5 


f 1 ) 





1 


/o\] 



> 



8.4.5 Finding The Null Space Or Kernel Of A Matrix 

Let A be an m x n matrix. 

Definition 8.4.22 ker (A), also referred to as the null space of A is defined as follows. 

ker (A) = {x : Ax = 0} 

and to find ker (A) one must solve the system of equations Ax = 0. 

This is not new! There is just some new terminology being used. To repeat, ker (A) is the 
solution to the system Ax = 0. 



Example 8.4.23 Let 



Find ker (A) . 



A 



1 2 1 
-1 1 

2 3 3 
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You need to solve the equation Ax = 0. To do this you write the augmented matrix and then 
obtain the row reduced echelon form and the solution. The augmented matrix is 




Next place this matrix in row reduced echelon form, 

3 




Note that x\ and x^ are basic variables while £3 is a free variable. Therefore, the solution to this 
system of equations, Ax = is given by 




Example 8.4.24 Let 



/1 

2 
3 
V4 



te 



2 1 1 \ 
-113 

12 3 1 
-2260/ 



Find the null space of A. 

You need to solve the equation, Ax = 0. The augmented matrix is 



Its row reduced echelon form is 



(I 

2 
3 

V 4 


2 1 
-1 1 3 

1 2 3 
-2 2 6 


1 

1 







0/ 


/I 






3 6 

1 i -3 

x 5 5 
000 

000 


1 

5 






1 o\ 

1 
1 
1 



It follows x\ and x^ are basic variables and £3, £4, £5 are free variables. Therefore, ker (A) is given 

(-!) -i+d) •»+(-!)-» 

si 



V 



S2 
S3 



si,s2,ss e 



J 



We write this in the form 



si 



rn 




(f\ 




f K -\ 


5 
1 



V / 


+ S2 


5 



I / 


+ S3 


5 




V 1 / 



si,s 2 ,ss e 
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In other words, the null space of this matrix equals the span of the three vectors above. Thus 



ker (A) = span 



((-\\ 




( f \ 




( t-W 


5 
1 




5 


5 



1 
V o / 


5 


5 




V i // 



This is the same as 



(( f ^ 

5 
-1 


5 


5 




5 


( f\\ 

5 









-1 







VV o / 




V o / 




w// 



ker (A) = span 



Notice also that the three vectors above are linearly independent and so the dimension of ker (A) 
is 3. This is generally the way it works. The number of free variables equals the dimension of the 
null space while the number of basic variables equals the number of pivot columns which equals the 
rank. We state this in the following theorem. 

Definition 8.4.25 The dimension of the null space of a matrix is called the nullity 2 and written 
as null (A) . 

Theorem 8.4.26 Let A be an m x n matrix. Then rank (A) + null (A) = n. 

8.4.6 Rank And Existence Of Solutions To Linear Systems 

Consider the linear system of equations, 

Ax = b (8.4) 

where A is an m x n matrix, x is a n x 1 column vector, and b is an m x 1 column vector. Suppose 

A = ( ai • • • a n ) 



2 Isn't it amazing how many different words are available for use in linear algebra? 



T • 



where the a& denote the columns of A. Then x = (xi, • • • , x n ) is a solution of the system 8.4, if 
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and only if 



xiai 



' X n 3. n 



which says that b is a vector in span(ai, • • • ,a n ) . This shows that there exists a solution to the 
system, 8.4 if and only if b is contained in span (ai, • • • , a n ) . In words, there is a solution to 8.4 if 
and only if b is in the column space of A. In terms of rank, the following proposition describes the 
situation. 

Proposition 8.4.27 Let A be an m x n matrix and let b be an m x 1 column vector. Then there 
exists a solution to 8.4 if and only if 



rank ( A 



rank (A) . 



3.5) 



Proof: Place (A | b ) and A in row reduced echelon form, respectively B and C. If the 
above condition on rank is true, then both B and C have the same number of nonzero rows. In 
particular, you cannot have a row of the form 



(0 







) 
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where ■ 7^ in B. Therefore, there will exist a solution to the system 8.4. 

Conversely, suppose there exists a solution. This means there cannot be such a row in B described 
above. Therefore, B and C must have the same number of zero rows and so they have the same 
number of nonzero rows. Therefore, the rank of the two matrices in 8.5 is the same. ■ 

8.5 Fredholm Alternative 

There is a very useful version of Proposition 8.4.27 known as the Fredholm alternative. I will 
only present this for the case of real matrices here. Later a much more elegant and general approach 
is presented which allows for the general case of complex matrices. 
The following definition is used to state the Fredholm alternative. 

Definition 8.5.1 Let S C R m . Then S 1 - = {z G M m : z • s = for every s E S} . The funny expo- 
nent, JL is called "perp". 

Now note 

ker (A T ) = {z : A T z = 0} = < z : ^T z k 8i k = > 

Lemma 8.5.2 Let A be a real m x n matrix, let x G M n and y G M m . Then 

(Ax-y) = (x-A T y) 
Proof: This follows right away from the definition of the dot product and matrix multiplication. 

(Ax-y) = y^ A ki xiy k 

k,l 

= Y.( AT )ih x iy* 

k,l 

= (x-^y).B 

Now it is time to state the Fredholm alternative. The first version of this is the following theorem. 

Theorem 8.5.3 Let A be a real m x n matrix and let b G W 71 . There exists a solution, x to the 
equation Ax — h if and only i/bG ker (A T ) ■ 

Proof: First suppose b G ker (A T ) • Then this says that if A T x = 0, it follows that b • x = 0. 
In other words, taking the transpose, if 

x T A = 0, then b • x = 0. 

In other words, letting x = (x±, • • • , x m ) , it follows that if 



then it follows 



2_^ XiAij = for each j, 



y^bjXj = 0. 



In other words, if you get a row of zeros in row reduced echelon form for A then you the same row 
operations produce a zero in the m x 1 matrix b. 
Consequently 

rank (A | b ) = rank (A) 
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and so by Proposition 8.4.27, there exists a solution, x to the system Ax = b. It remains to go the 
other direction. 

Let z G ker (A T ) and suppose Ax = b. I need to verify b • z = 0. By Lemma 8.5.2, 

b z = Ax z = x A T z = x • = ■ 

This implies the following corollary which is also called the Fredholm alternative. The "alterna- 
tive" becomes more clear in this corollary. 

Corollary 8.5.4 Let A be an m x n matrix. Then A maps W 1 onto R m if and only if the only 
solution to A T x = is x = 0. 

Proof: If the only solution to A T x = is x = 0, then ker (A T ) = {0} and so ker (A T ) X = R m 
because every b G M m has the property that b • = 0. Therefore, Ax = b has a solution for any 
b G M m because the b for which there is a solution are those in ker (A T ) by Theorem 8.5.3. In 
other words, A maps R n onto R m . 

Conversely if A is onto, then by Theorem 8.5.3 every b G M 171 is in ker (A T ) and so if A T x = 0, 
then b • x = for every b. In particular, this holds for b = x. Hence if A T x = 0, then x = 0. ■ 

Here is an amusing example. 

Example 8.5.5 Let A be an m x n matrix in which m > n. Then A cannot map onto W 71 . 
The reason for this is that A T is an n x m where m > n and so in the augmented matrix 

{A T \0) 
there must be some free variables. Thus there exists a nonzero vector x such that A T x = 0. 

8.5.1 Row, Column, And Determinant Rank 

I will now present a review of earlier topics and prove Theorem 8.3.4. 

Definition 8.5.6 A sub-matrix of a matrix A is the rectangular array of numbers obtained by delet- 
ing some rows and columns of A. Let A be an m x n matrix. The determinant rank of the matrix 
equals r where r is the largest number such that some r x r sub-matrix of A has a non zero deter- 
minant. The row rank is defined to be the dimension of the span of the rows. The column rank 
is defined to be the dimension of the span of the columns. 

Theorem 8.5.7 //A, an m x n matrix has determinant rank, r, then there exist r rows of the 
matrix such that every other row is a linear combination of these r rows. 

Proof: Suppose the determinant rank of A = (a^) equals r. Thus some r xr submatrix has non 
zero determinant and there is no larger square submatrix which has non zero determinant. Suppose 
such a submatrix is determined by the r columns whose indices are 

jl< '" < jr 

and the r rows whose indices are 

i\ < • - • < i r 

I want to show that every row is a linear combination of these rows. Consider the I th row and let p 
be an index between 1 and n. Form the following (r + 1) x (r + 1) matrix 



/ a i\j\ " " " a iij r a iip i 

\ cti jl ••• a tjr ai p J 
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Of course you can assume I ^ {ii, • • • , i r } because there is nothing to prove if the I th row is one of 
the chosen ones. The above matrix has determinant 0. This is because if p ^ {ji, • • • , j r } then the 
above would be a submatrix of A which is too large to have non zero determinant. On the other 
hand, if p G {ji, • • • , j r } then the above matrix has two columns which are equal so its determinant 
is still 0. 

Expand the determinant of the above matrix along the last column. Let Ck denote the cofactor 
associated with the entry a\ hV . This is not dependent on the choice of p. Remember, you delete the 
column and the row the entry is in and take the determinant of what is left and multiply by —1 
raised to an appropriate power. Let C denote the cofactor associated with a\ v . This is given to be 
nonzero, it being the determinant of the matrix 



x *iji 



^l3r 



\ 



Thus 



which implies 



H r jl " ' " a i r jr / 

r 

= aipC + 2_^ Ck^ikp 



fc=i 



-Ck 






C 

fc=i fc=i 

Since this is true for every p and since rrik does not depend on p, this has shown the I th row is a 
linear combination of the zi, i2, • • • , i r rows. ■ 

Corollary 8.5.8 The determinant rank equals the row rank. 
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Proof: From Theorem 8.5.7, the row rank is no larger than the determinant rank. Could the row 
rank be smaller than the determinant rank? If so, there exist p rows for p < r such that the span of 
these p rows equals the row space. But this implies that the r x r sub-matrix whose determinant is 
nonzero also has row rank no larger than p which is impossible if its determinant is to be nonzero 
because at least one row is a linear combination of the others. ■ 

Corollary 8.5.9 If A has determinant rank, r, then there exist r columns of the matrix such that 
every other column is a linear combination of these r columns. Also the column rank equals the 
determinant rank. 

Proof: This follows from the above by considering A T . The rows of A T are the columns of A 
and the determinant rank of A T and A are the same. Therefore, from Corollary 8.5.8, column rank 
of A = row rank of A T = determinant rank of A T = determinant rank of A. ■ 

The following theorem is of fundamental importance and ties together many of the ideas presented 
above. 

Theorem 8.5.10 Let A be an n x n matrix. Then the following are equivalent. 
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1. det (A) =0. 

2. A, A T are not one to one. 

3. A is not onto. 

Proof: Suppose det (A) = 0. Then the determinant rank of A = r < n. Therefore, there exist r 
columns such that every other column is a linear combination of these columns by Theorem 8.5.7. 
In particular, it follows that for some m, the m th column is a linear combination of all the others. 
Thus letting A — ( ai • • • a m • • • a n ) where the columns are denoted by a^ , there exists 
scalars, oti such that 

k^m 

Now consider the column vector x = ( a\ ••• —1 ••• cx n ) . Then 

Ax. = -a m + ^2 a k a k = 0. 

ky^m 

Since also AO = 0, it follows A is not one to one. Similarly, A T is not one to one by the same 
argument applied to A T . This verifies that 1.) implies 2.). 

Now suppose 2.). Then since A T is not one to one, it follows there exists x/0 such that 

A T x = 0. 

Taking the transpose of both sides yields 

x T A = 

where the is a 1 x n matrix or row vector. Now if Ay = x, then 

|x| 2 = x T (Ay) = (x T A) y = Oy = 

contrary to x ^ 0. Consequently there can be no y such that Ay = x and so A is not onto. This 
shows that 2.) implies 3.). 

Finally, suppose 3.). If 1.) does not hold, then det (A) ^ but then from Theorem 7.2.14 A~ x 
exists and so for every y G F n there exists a unique x G F n such that Ax. = y. In fact x = A~ x y. 
Thus A would be onto contrary to 3.). This shows 3.) implies 1.) ■ 

Corollary 8.5.11 Let A be an n x n matrix. Then the following are equivalent. 

1. det(A) ^ 0. 

2. A and A T are one to one. 

3. A is onto. 

Proof: This follows immediately from the above theorem. ■ 

Corollary 8.5.12 Let A be an invertible nxn matrix. Then A equals a finite product of elementary 
matrices. 

Proof: Since A -1 is given to exist, det (A) ^ and it follows A must have rank n and so the 
row reduced echelon form of A is /. Therefore, by Theorem 8.1.6 there is a sequence of elementary 
matrices, E±, • • • ,E P which accomplish successive row operations such that 

(E p E p _ 1 -.-E 1 )A = I. 

But now multiply on the left on both sides by E~ x then by E~} 1 and then by E~} 2 etc. until you 
get 

A = E± E 2 ■■■E~_ 1 E~ 

and by Theorem 8.1.6 each of these in this product is an elementary matrix. ■ 
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8.6 Exercises 

1. Let {ui, • • • , u n } be vectors in R n . The parallelepiped determined by these vectors 

P(ui,--- ,u n ) 
is defined as 

P(ui,--- ,u n ) = <^ ^2t k u k : t k E [0,1] for all k > . 

Now let A be an n x n matrix. Show that 

{ix:xEP(ui,--- ,u n )} 
is also a parallelepiped. 

2. In the context of Problem 1, draw P(ei,e 2 ) where ei,e2 are the standard basis vectors for 
R 2 . Thus ei = (1, 0) , e 2 = (0, 1) . Now suppose 

1 1 
1 

where E is the elementary matrix which takes the third row and adds to the first. Draw 

{£x:xeP(ei,e 2 )}. 

In other words, draw the result of doing E to the vectors in P (ei, e 2 ). Next draw the results 
of doing the other elementary matrices to P (ei, e 2 ). 

3. In the context of Problem 1, either draw or describe the result of doing elementary matrices 
to P(ei,e 2 ,e 3 ). Describe geometrically the conclusion of Corollary 8.5.12. 

4. Determine which matrices are in row reduced echelon form. 

(a) 

(b) 



1 


2 





\ 







1 


7 


) 




1 











\ 








1 


2 
















/ 


1 


1 








5 








1 


2 


4 














1 3 



(c) 

5. Row reduce the following matrices to obtain the row reduced echelon form. List the pivot 
columns in the original matrix. 

(a) 



(b) 



(c) 
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6. Find the rank of the following matrices. If the rank is r, identify r columns in the origi- 
nal matrix which have the property that every other column may be written as a linear 
combination of these. Also find a basis for the row and column spaces of the matrices. 



(a) 



(b) 



(c) 



(d) 



(e) 



/I 


2 


o\ 








3 


2 


1 








2 


1 











\o 


2 


W 








/I 





M 








4 


1 


i 








2 


1 











\o 


2 


oy 








f° 


1 





2 


1 


2 2 \ 





3 


2 


12 


1 


6 8 





1 


1 


5 





2 3 


\o 


2 


1 


7 





3 4/ 


f° 


1 





2 





1 o\ 





3 


2 


6 





5 4 





1 


1 


2 





2 2 


\o 


2 


1 


4 





3 2/ 


f° 


1 





2 


1 


1 2\ 





3 


2 


6 


1 


5 1 





1 


1 


2 





2 1 


\ o 


2 


1 


4 





3 1 / 



7. Suppose A is an m x n matrix. Explain why the rank of A is always no larger than min (m,n) . 

1 \ / 2\ / 1 



8. Let iJ denote span 



9. Let iJ denote span 




. Find the dimension of H and determine a basis. 



. Find the dimension of H and de- 



Download free eBooks at bookboon.com 



203 



Elementary Linear Algebra 



Rank Of A Matrix 



termine a basis. 

10. Let H denote span I I 2 
termine a basis. 



. Find the dimension of H and de- 



11. Let M = {u = (ui, i^2, ^3, U4) G M 4 : u% = u\ = 0} . Is M a subspace? Explain. 

12. Let M = {u = (^1,^2,^3,^4) G M 4 : ^3 > u\] . Is M a subspace? Explain. 

13. Let w G R 4 and let M = {u = ('Ui, ^2, ^3, ^4) G M 4 : w • u = 0} . Is M a subspace? Explain. 

14. Let M = {u = (izi, 1*2, ^3, U4) G R 4 : t/j > for each z = 1, 2, 3, 4} . Is M a subspace? Explain. 

15. Let w, w 1 be given vectors in R 4 and define 

M = {u = (iti, 1^2, ^3, ^4) G M 4 : w • u = and wi • u = 0} . 
Is M a subspace? Explain. 
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16. Let M = {u = (^1,^2,^3,^4) € R 4 : |^i| < 4} . Is M a subspace? Explain. 

17. Let M = {u = (ui, 112,113, U4) G R 4 : sin (1/1) = l} . Is M a subspace? Explain. 

18. Study the definition of span. Explain what is meant by the span of a set of vectors. Include 
pictures. 

19. Suppose {xi, • • • , Xfc} is a set of vectors from F n . Show that span (xi, • • • , x^) contains 0. 

20. Study the definition of linear independence. Explain in your own words what is meant by 
linear independence and linear dependence. Illustrate with pictures. 

21. Use Corollary 8.4.17 to prove the following theorem: If A, B are nxn matrices and if AB = /, 
then BA = I and B = A -1 . Hint: First note that if AB = /, then it must be the case that 
A is onto. Explain why this requires span (columns of A) = F n . Now explain why, using the 
corollary that this requires A to be one to one. Next explain why A (BA — I) = and why 
the fact that A is one to one implies BA = I. 

22. Here are three vectors. Determine whether they are linearly independent or linearly dependent. 




23. Here are three vectors. Determine whether they are linearly independent or linearly dependent. 




24. Here are three vectors. Determine whether they are linearly independent or linearly dependent. 




3 . Are these vectors linearly indepen- 




25. Here are four vectors. Determine whether they span 
dent? 

:)■(:)■(: 

26. Here are four vectors. Determine whether they span 
dent? 



27. Determine whether the following vectors are a basis for R . If they are, explain why they are 
and if they are not, give a reason and tell whether they span R 3 . 



3 . Are these vectors linearly indepen- 
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28. Determine whether the following vectors are a basis for R 3 . If they are, explain why they are 
and if they are not, give a reason and tell whether they span R 3 . 

:)■(!)■(: 

29. Determine whether the following vectors are a basis for R 3 . If they are, explain why they are 
and if they are not, give a reason and tell whether they span R 3 . 

s)'(i)#(s 

30. Determine whether the following vectors are a basis for R 3 . If they are, explain why they are 
and if they are not, give a reason and tell whether they span R 3 . 



31. Consider the vectors of the form 

" 2t + 3s 

s — t I : s,t G ' 

t + s 

Is this set of vectors a subspace of R 3 ? If so, explain why, give a basis for the subspace and 
find its dimension. 



32. Consider the vectors of the form 



' ( 2t + 3s + u \ 

s-t 

t + s 
u 



: s,t,u G 



\ 



J 



> . 



Is this set of vectors a subspace of R 4 ? If so, explain why, give a basis for the subspace and 
find its dimension. 



33. Consider the vectors of the form 



' ( 2t + u \ 

t + 3u 

t + s + v 
u 



: 5, t, u, v G . 



IV 



/ 



Is this set of vectors a subspace of R 4 ? If so, explain why, give a basis for the subspace and 
find its dimension. 

34. If you have 5 vectors in F 5 and the vectors are linearly independent, can it always be concluded 
they span F 5 ? Explain. 

35. If you have 6 vectors in F 5 , is it possible they are linearly independent? Explain. 

36. Suppose A is an m x n matrix and {wi, • • • , w&} is a linearly independent set of vectors in 

A (F n ) C F m . Now suppose A (z^) = w^. Show {zi, • • • , z^} is also independent. 
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37. Suppose V, W are subspaces of F n . Show V fl W defined to be all vectors which are in both V 
and W is a subspace also. 

38. Suppose V and W both have dimension equal to 7 and they are subspaces of F 10 . What are 
the possibilities for the dimension of V fl W? Hint: Remember that a linear independent set 
can be extended to form a basis. 

39. Suppose V has dimension p and W has dimension q and they are each contained in a subspace, 
U which has dimension equal to n where n > max (p, q) . What are the possibilities for the 
dimension of V fl Wl Hint: Remember that a linear independent set can be extended to form 
a basis. 

40. If b ^ 0, can the solution set of ^4x = b be a plane through the origin? Explain. 

41. Suppose a system of equations has fewer equations than variables and you have found a solution 
to this system of equations. Is it possible that your solution is the only one? Explain. 

42. Suppose a system of linear equations has a 2 x 4 augmented matrix and the last column is a 
pivot column. Could the system of linear equations be consistent? Explain. 

43. Suppose the coefficient matrix of a system of n equations with n variables has the property 
that every column is a pivot column. Does it follow that the system of equations must have a 
solution? If so, must the solution be unique? Explain. 

44. Suppose there is a unique solution to a system of linear equations. What must be true of the 
pivot columns in the augmented matrix. 

45. State whether each of the following sets of data are possible for the matrix equation Ax = b. 
If possible, describe the solution set. That is, tell whether there exists a unique solution no 
solution or infinitely many solutions. 



(a) A is a 5 x 6 matrix, rank (A) = 4 and rank(A|b) = 4. Hint: This says b is in the span 
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of four of the columns. Thus the columns are not independent. 

(b) A is a 3 x 4 matrix, rank (A) = 3 and rank (A\h) = 2. 

(c) A is a 4 x 2 matrix, rank (A) = 4 and rank(A|b) = 4. Hint: This says b is in the span 
of the columns and the columns must be independent. 

(d) A is a 5 x 5 matrix, rank (A) = 4 and rank(A|b) = 5. Hint: This says b is not in the 
span of the columns. 

(e) A is a 4 x 2 matrix, rank (A) = 2 and rank (A\b) — 2. 

46. Suppose A is an m x n matrix in which m < n. Suppose also that the rank of A equals m. 
Show that A maps F n onto F m . Hint: The vectors e x , • • • , e m occur as columns in the row 
reduced echelon form for A. 

47. Suppose A is an m x n matrix in which m > n. Suppose also that the rank of A equals n. 
Show that A is one to one. Hint: If not, there exists a vector x such that Ax = 0, and this 
implies at least one column of A is a linear combination of the others. Show this would require 
the column rank to be less than n. 

48. Explain why an n x n matrix A is both one to one and onto if and only if its rank is n. 
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49. Suppose A is an m x n matrix and B is an n x p matrix. Show that 

dim (ker (AB)) < dim (ker (A)) + dim (ker (B)) . 

Hint: Consider the subspace, B (¥ p ) D ker (A) and suppose a basis for this subspace is 

{wi,-" ,w fe }. 

Now suppose {ui, • • • , u r } is a basis for ker (B) . Let {zi, • • • , z^} be such that Bzi = w^ and 
argue that 

ker (AB) C span (ui, • • • , u r , zi, • • • , z&) . 

Here is how you do this. Suppose ABx = 0. Then Bx £ ker (A)n J B (F p ) and so 5x = X^=i ^ z i 
showing that 

k 

x- ^2 z i e ker (-S) • 

i=l 

50. Explain why Ax = always has a solution even when A -1 does not exist. 

(a) What can you conclude about A if the solution is unique? 

(b) What can you conclude about A if the solution is not unique? 

51. Suppose det (A — XI) = 0. Show using Theorem 9.2.9 there exists x^O such that 

(A - XI) x = 0. 

52. Let A be an n x n matrix and let x be a nonzero vector such that Ax = Ax for some scalar 
A. When this occurs, the vector x is called an eigenvector and the scalar A is called an 
eigenvalue. It turns out that not every number is an eigenvalue. Only certain ones are. 
Why? Hint: Show that if Ax = Ax, then (A — XI) x = 0. Explain why this shows that 
(A — XI) is not one to one and not onto. Now use Theorem 9.2.9 to argue det (A — XI) = 0. 
What sort of equation is this? How many solutions does it have? 

53. Let m < n and let A be an m x n matrix. Show that A is not one to one. Hint: Consider 
the n x n matrix A\ which is of the form 

where the denotes an (n — m) x n matrix of zeros. Thus det A\ = and so A\ is not one to 
one. Now observe that A±x. is the vector 

( Ax 
A ^={ 

which equals zero if and only if Ax = 0. Do this using the Fredholm alternative. 

54. Let A be an m x n real matrix and let b £ R m . Show there exists a solution, x to the system 

A T A X = A T b 

Next show that if x, x x are two solutions, then Ax = Ax\. Hint: First show that (A T A) = 
A T A. Next show if x £ ker (A T A) , then Ax = 0. Finally apply the Fredholm alternative. This 
will give existence of a solution. 

55. Show that in the context of Problem 54 that if x is the solution there, then |b — Ax\ < |b — Ay\ 
for every y. Thus Ax is the point of A (W 1 ) which is closest to b of every point in A (M 71 ). 
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56. Let A be an n x n matrix and consider the matrices < /, A, A 2 , • • • , A n > . Explain why there 
exist scalars, q not all zero such that 



5>A* = 0. 



Then argue there exists a polynomial, p (A) of the form 

such that p (A) = and if q (A) is another polynomial such that q (A) = 0, then q (A) is of the 
form p (A) I (A) for some polynomial, I (A) . This extra special polynomial, p (A) is called the 
minimal polynomial. Hint: You might consider an n x n matrix as a vector in F n . 
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9.1 Linear Transformations 

An m x n matrix can be used to transform vectors in F n to vectors in F m through the use of matrix 
multiplication. 

Example 9.1.1 Consider the matrix I J . Think of it as a function which takes vectors 

fx\ 

in F 3 and makes them in to vectors in F 2 as follows. For \ y \ a vector in F 3 , multiply on the 

left by the given matrix to obtain the vector in F 2 . Here are some numerical examples. 

1 • '-3 



14 

7 

' \ -6 J x /x ' \ 6 J 

More generally, 



z 

The idea is to define a function which takes vectors in F 3 and delivers new vectors in F 2 . 

This is an example of something called a linear transformation. 

Definition 9.1.2 Let T : F n ^ ¥ m be a function. Thus for each x e F n ,Tx G F m . Then T is a 
linear transformation if whenever a, /3 are scalars and xi and X2 are vectors in F n , 

T (axi + /3x 2 ) = aiTxi + /3Tx 2 . 

A linear transformation is also called a homomorphism. In the case that T is in addition to this 
one to one and onto, it is sometimes called an isomorphism. 

The last two terms are typically used more in abstract algebra than in linear algebra so in this 
book, such mappings will be referred to as linear transformations. In sloppy language, it distributes 
across vector addition and you can factor out the scalars. 

In words, linear transformations distribute across + and allow you to factor out scalars. At this 
point, recall the properties of matrix multiplication. The pertinent property is 5.14 on Page 107. 
Recall it states that for a and b scalars, 

A (aB + bC) = aAB + bAC 
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In particular, for A an m x n matrix and B and C,nxl matrices (column vectors) the above formula 
holds which is nothing more than the statement that matrix multiplication gives an example of a 
linear transformation. 

The reason this concept is so important is there are many examples of things which are linear 
transformations. You might remember from calculus that the operator which consists of taking the 
derivative is a linear transformation. That is, if /, g are functions (vectors) and a, f3 are numbers 
(scalars) 

Another example of a linear transformation is that of rotation through an angle. For example, 
I may want to rotate every vector through an angle of 45 degrees. Such a rotation would achieve 
something like the following if applied to each vector corresponding to points on the picture which 
is standing upright. 




More generally, denote a rotation by T. Why is such a transformation linear? Consider the 
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following picture which illustrates a rotation. 




T(bj 



To get T (a + b) , you can add Ta and Tb. Here is why. If you add Ta to Tb you get the 
diagonal of the parallelogram determined by Ta and Tb. This diagonal also results from rotating 
the diagonal of the parallelogram determined by a and b. This is because the rotation preserves 
all angles between the vectors as well as their lengths. In particular, it preserves the shape of this 
parallelogram. Thus both Ta + Tb and T(a + b) give the same directed line segment. Thus T 
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distributes across + where + refers to vector addition. Similarly, if k is a number Tka = kTa 
(draw a picture) and so you can factor out scalars also. Thus rotations are an example of a linear 
transformation. 

Definition 9.1.3 A linear transformation is called one to one (often written as 1 — 1) if it never 
takes two different vectors to the same vector. Thus T is one to one if whenever x/y 

Tx + Ty. 

Equivalently, if T (x) —T (y) , then x = y . 

In the case that a linear transformation comes from matrix multiplication, it is common usage 
to refer to the matrix as a one to one matrix when the linear transformation it determines is one to 



Definition 9.1.4 A linear transformation mapping F n to F m is called onto if whenever y G F m 
there exists x G F n such that T (x) = y. 

Thus T is onto if everything in F m gets hit. In the case that a linear transformation comes from 
matrix multiplication, it is common to refer to the matrix as onto when the linear transformation it 
determines is onto. Also it is common usage to write TF n , T (F n ) ,or Im (T) as the set of vectors of 
F m which are of the form Tx for some x G F n . In the case that T is obtained from multiplication by 
anmxn matrix A, it is standard to simply write A (F n ), AF n , or Im (A) to denote those vectors in 
F m which are obtained in the form Ax for some x G F n . 

9.2 Constructing The Matrix Of A Linear Transformation 

It turns out that if T is any linear transformation which maps F n to F m , there is always an m x n 
matrix A with the property that 

Ax = Tx (9.1) 

for all x G F n . Here is why. Suppose T : F n i— »> F m is a linear transformation and you want to find 
the matrix defined by this linear transformation as described in 9.1. Then if x G F n it follows 



E< 



J\- — / Jb j tJ < 



where e^ is the vector which has zeros in every slot but the i th and a 1 in this slot. Then since Tis 
linear, 




r(ei) 



and so you see that the matrix desired is obtained from letting the i th column equal T (e^) . We state 
this as the following theorem. 

Theorem 9.2.1 Let T be a linear transformation from F n to F m . Then the matrix A satisfying 9.1 
is given by 

I ' ' 

T(ei) ••• T(e n ) 

V I I 

where Te^ is the i th column of A. 
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9.2.1 Rotations in M 2 

Sometimes you need to find a matrix which represents a given linear transformation which is de- 
scribed in geometrical terms. The idea is to produce a matrix which you can multiply a vector by 
to get the same thing as some geometrical description. A good example of this is the problem of 
rotation of vectors discussed above. Consider the problem of rotating through an angle of 0. 

Example 9.2.2 Determine the matrix which represents the linear transformation defined by rotating 
every vector through an angle of 0. 



and 62=1-, ) . These identify the geometric vectors which point along the 



Let ei - j . __ ~ z _ . 1 

positive x axis and positive y axis as shown. 



(sin(0),cos(0)) 




(cos(8),sin(6)) 



From the above, you only need to find Te± and Te2, the first being the first column of the 
desired matrix A and the second being the second column. From the definition of the cos, sin the 
coordinates of T(ei) are as shown in the picture. The coordinates of T(e 2 ) also follow from simple 
trigonometry. Thus 



sine 



cost 



Therefore, from Theorem 9.2.1, 



A = 



cost 
sin£ 



— suit 
cos0 



For those who prefer a more algebraic approach, the definition of (cos (0) , sin (0)) is as the x and 
y coordinates of the point (1,0) . Now the point of the vector from (0, 0) to (0, 1), e 2 is exactly ir/2 
further along along the unit circle. Therefore, when it is rotated through an angle of the x and y 
coordinates are given by 

(x, y) = (cos (0 + tt/2) , sin (0 + tt/2)) = (- sin 0, cos 0) . 

Example 9.2.3 Find the matrix of the linear transformation which is obtained by first rotating all 
vectors through an angle of (j) and then through an angle 0. Thus you want the linear transformation 
which rotates all angles through an angle of 8 + (j). 

Let Tq+§ denote the linear transformation which rotates every vector through an angle of + (j). 
Then to get T^+0, you could first do T^ and then do Tq where X^ is the linear transformation which 
rotates through an angle of (j) and Tq is the linear transformation which rotates through an angle of 
0. Denoting the corresponding matrices by Aq^^, A^, and Aq, you must have for every x 

A# + 0X = T# + 0X = T Q T^ = AqA^. 
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Consequently, you must have 



. cos(0 + 0) -sin (0 + 0) \ 
Ae+(b ~ 1 sin (0 + 0) cos (0 + 0) )-**** 

cos — sin \ ( cos — sin c 
sin0 cos0 J y sin0 cos0 

You know how to multiply matrices. Do so to the pair on the right. This yields 

cos (0 + 0) - sin (0 + 0) 
sin (0 + 0) cos (0 + 0) 

cos cos — sin sin — cos V sin <p — sin V cos ( 
sin cos + cos sin cos cos — sin sin 

Don't these look familiar? They are the usual trig, identities for the sum of two angles derived here 
using linear algebra concepts. 

You do not have to stop with two dimensions. You can consider rotations and other geometric 
concepts in any number of dimensions. This is one of the major advantages of linear algebra. You can 
break down a difficult geometrical procedure into small steps, each corresponding to multiplication by 
an appropriate matrix. Then by multiplying the matrices, you can obtain a single matrix which can 
give you numerical information on the results of applying the given sequence of simple procedures. 
That which you could never visualize can still be understood to the extent of finding exact numerical 
answers. Another example follows. 

Example 9.2.4 Find the matrix of the linear transformation which is obtained by first rotating all 
vectors through an angle of 7r/6 and then reflecting through the x axis. 

As shown in Example 9.2.3, the matrix of the transformation which involves rotating through an 
angle of tt/6 is 

cos (tt/6) — sin(7r/6) \ _ / \V% —\ 
sin(7r/6) cos(^/6) )~ \ \ \^l 

The matrix for the transformation which reflects all vectors through the x axis is 

1 
-1 

Therefore, the matrix of the linear transformation which first rotates through tt/6 and then reflects 
through the x axis is 

-1 { I K/3 )-[ -\ -W5 
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9.2.2 Rotations About A Particular Vector 

The problem is to find the matrix of the linear transformation which rotates all vectors about a 
given unit vector u which is possibly not one of the coordinate vectors i, j, or k. Suppose for \c\ ^ 1 



:(a,6,c), \Ja 2 + b 2 + , 



1. 



First I will produce a matrix which maps u to k such that the right handed rotation about k 
corresponds to the right handed rotation about u. Then I will rotate about k and finally, I will 
multiply by the inverse of the first matrix to get the desired result. 

To begin, find vectors w, v such that w x v = u. Let 



w 



y/a 2 + b 2 ' Va 2 + b 2 



,0 
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This vector is clearly perpendicular to u. Then v = (a, 6, c) x w = u x w. Thus from the 
geometric description of the cross product, w x v = u. Computing the cross product gives 

b a 



•■(a,b,c) 



'Va 2 + b 2 'Va 2 + b 2 



,0 



Now I want to have Tw = i, Tv = j,Tu = k. What does this? It is the inverse of the matrix 
which takes i to w, j to v, and k to u. This matrix is 



^/a 2 +b 2 
a 

Va 2 +b 2 





V(« 2 + fo2 ) 

c 

Vo 2 +6 2 



\ 



= 6 6 



Its inverse is 



l 



yj{a 2 +b 2 ) v/(a 2 +6 2 ) 



V 



V( a2 + &2 ) 



V(a 2 +& 2 ) 
6 





vV + 6 2 ) 

c 



Therefore, the matrix which does the rotating is 



V 



Va 2 +6 2 v /(a 2 +6 2 ) 



a a \ 



Va 2 +6 2 





vV+b 2 ) 

a 2 +6 2 
Va 2 +6 2 




V(a 2 +6 2 ) V(a 2 +6 2 ) 



V 



VV+ 62 ) 
a 



V( fl2 + b2 ) 

6 



6 vV + 6 2 ) 



This yields a matrix whose columns are 



b 2 cos 9+c 2 a 2 cos 6>+a 4 +a 2 b 2 

a 2 +6 2 

-ba cos fl+cb 2 sin fl+ca 2 sin 9-\-c 2 ab cos fl+ba 3 -|-b 3 a 

a 2 +b 2 

— (sin 9)b — (cos 0) ca + ca 

-ba cos 9 — ca sin 9— cb sin fl+c ab cos fl+ba +b a \ 
a 2 +b 2 » 

a 2 cos 6>+c 2 b 2 cos 6>+a 2 b 2 +b 4 
a 2 +b 2 

(sin 9) a — (cos 0) cb + cb ) 



(sin 0) 6 — (cos 0) ca + ca 

- (sin 8) a — (cos 0) cb + c6 

(a 2 + 6 2 )cos(9 + c 2 

Using the assumption that u is a unit vector so that a 2 + b 2 +c 2 = 1 , it follows the desired matrix 



is 



cos 6 — a 2 cos 9 + a 2 
-6a cos + 6a + c sin 



-ba cos + 6a — c sin (sin 0) b — (cos 0) ca + ca 
-6 2 cos + 6 2 + cos — (sin 0) a — (cos 9) cb + c6 



(sin 0) 6 — (cos 0) ca + ca (sin 9) a — (cos 0) c6 + c6 



(i 



l cos 



This was done under the assumption that \c\ ^ 1. However, if this condition does not hold, you 
can verify directly that the above still gives the correct answer. 
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9.2.3 Projections 

In Physics it is important to consider the work done by a force field on an object. This involves the 
concept of projection onto a vector. Suppose you want to find the projection of a vector v onto the 
given vector u, denoted by proj u (v) This is done using the dot product as follows. 

pr0ju (v) = (H) u 

Because of properties of the dot product, the map v^proj u (v) is linear, 

proj u (av+/3w) = u = at ) u + pi u 

\ u • u J Vuu/ Vuu/ 

= ol proj u (v) + /3 proj u (w) . 

Example 9.2.5 Let the projection map be defined above and let u = (l,2,3) . Does this linear 
transformation come from multiplication by a matrix? If so, what is the matrix? 

You can find this matrix in the same way as in the previous example. Let e^ denote the vec- 
tor in R n which has a 1 in the i th position and a zero everywhere else. Thus a typical vector 
x = (#i, • • • , x n ) can be written in a unique way as 

n 

X = £ j X j e j- 

3 = 1 

From the way you multiply a matrix by a vector, it follows that proj u (e^) gives the i th column of 
the desired matrix. Therefore, it is only necessary to find 

proj u (e^(^)u 

For the given vector in the example, this implies the columns of the desired matrix are 

1 \ o / 1 



Hence the matrix is 





9.2.4 Matrices Which Are One To One Or Onto 

Lemma 9.2.6 Let A be anmxn matrix. Then A (¥ n ) = span (ai, • • • , a n ) where ai, • • • , a n denote 
the columns of A. In fact, for x = (#i, • • • , x n ) , 

n 

Ax = ^2x k a. k . 

k=l 

Proof: This follows from the definition of matrix multiplication in Definition 5.1.9 on Page 100. 
■ 

The following is a theorem of major significance. First here is an interesting observation. 

Observation 9.2.7 Let A be anmxn matrix. Then A is one to one if and only if Ax = implies 
x = 0. 
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Here is why: AO = A (0 + 0) = AO + AO and so AO = 0. 

Now suppose A is one to one and Ax = 0. Then since AO = 0, it follows x = 0. Thus if A is one 
to one and Ax = 0, then x = 0. 

Next suppose the condition that Ax = implies x = is valid. Then if Ax = Ay, then 
A (x — y) = and so from the condition, x — y = so that x = y. Thus A is one to one. 

Theorem 9.2.8 Suppose A is annxn matrix. Then A is one to one if and only if A is onto. Also, 
if B is an n x n matrix and AB = J, then it follows BA = /. 



Proof: First suppose A is one to one. Consider the vectors, {Aei,--- ,Ae n } where e& is the 
column vector which is all zeros except for a 1 in the k th position. This set of vectors is linearly 
independent because if 



then since A is linear, 
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and since A is one to one, it follows 

n 

^2c k e k = 
fc=i 

which implies each c k = 0. Therefore, {Aei, • • • , Ae n } must be a basis for F n by Corollary 8.4.17 on 
Page 190. It follows that for y E F n there exist constants, C{ such that 

n / n \ 

y = ^2 c k Ae k = A I Yl Ckek ) 

k=l \k=l J 

showing that, since y was arbitrary, A is onto. 

Next suppose A is onto. This implies the span of the columns of A equals F n and by Corollary 
8.4.17 this implies the columns of A are independent. If Ax = 0, then letting x = (x\, • • • , x n ) , it 
follows 

n 
^XiRi = 

2=1 

and so each xi = 0. If Ax = Ay, then A (x — y) =0 and so x = y. This shows A is one to one. 

Now suppose AB = I. Why is BA = II Since AB = I it follows 5 is one to one since otherwise, 
there would exist, x/0 such that £?x = and then A£?x = AO = / /x. Therefore, from what 
was just shown, B is also onto. In addition to this, A must be one to one because if Ay = 0, then 
y = £?x for some x and then x = A£?x = Ay = showing y = 0. Now from what is given to be so, 
it follows (AB) A = A and so using the associative law for matrix multiplication, 

A (BA) -A = A (BA - I) = 0. 

But this means (BA — I) x = for all x since otherwise, A would not be one to one. Hence BA = I 
as claimed. ■ 

This theorem shows that if an n x n matrix B acts like an inverse when multiplied on one side 
of A it follows that B = A -1 and it will act like an inverse on both sides of A. 

The conclusion of this theorem pertains to square matrices only. For example, let 

A=\ 1 . |,2?=( \ \ \) (9.2) 




Then 



but 



BA = 



1 
1 




AB = 

There is also an important characterization in terms of determinants. This is proved completely 
in the section on the mathematical theory of the determinant. 

Theorem 9.2.9 Let A be an n x n matrix and let Ta denote the linear transformation determined 
by A. Then the following are equivalent. 

1. Ta is one to one. 

2. Ta is onto. 

3. det (A) ^ 0. 
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9.2.5 The General Solution Of A Linear System 

Recall the following definition which was discussed above. 

Definition 9.2.10 T is a linear transformation if whenever x, y are vectors and a,b scalars, 

T (ax + by) = aTx + bTy. (9.3) 

Thus linear transformations distribute across addition and pass scalars to the outside. A linear 
system is one which is of the form 

Tx = b. 

//Txp = b, then x p is called a particular solution to the linear system. 

For example, if A is an m x n matrix and Ta is determined by 

T A (x) = Ax, 

then from the properties of matrix multiplication, Ta is a linear transformation. In this setting, 
we will usually write A for the linear transformation as well as the matrix. There are many other 
examples of linear transformations other than this. In differential equations, you will encounter 
linear transformations which act on functions to give new functions. In this case, the functions are 
considered as vectors. Don't worry too much about this at this time. It will happen later. The 
fundamental idea is that something is linear if 9.3 holds and if whenever a, b are scalars and x, y are 
vectors ax + by is a vector. That is you can add vectors and multiply by scalars. 

Definition 9.2.11 Let T be a linear transformation. Define 

ker (T) = {x : Tx = 0} . 

In words, ker (T) is called the kernel ofT. As just described, ker (T) consists of the set of all vectors 
which T sends to 0. This is also called the null space of T . It is also called the solution space 
of the equation Tx = 0. 

The above definition states that ker (T) is the set of solutions to the equation, 

Tx = 0. 

In the case where T is really a matrix, you have been solving such equations for quite some time. 
However, sometimes linear transformations act on vectors which are not in F n . There is more on this 
in Chapter 16 on Page 16 and this is discussed more carefully then. However, consider the following 
familiar example. 

Example 9.2.12 Let -^ denote the linear transformation defined on X, the functions which are 
defined on R and have a continuous derivative. Find ker (^) • 

The example asks for functions, / which the property that ^ = 0. As you know from calculus, 
these functions are the constant functions. Thus ker (^) = constant functions. 

When T is a linear transformation, systems of the form Tx = are called homogeneous sys- 
tems. Thus the solution to the homogeneous system is known as ker (T) . 

Systems of the form Tx = b where b^0 are called nonhomogeneous systems. It turns out 
there is a very interesting and important relation between the solutions to the homogeneous systems 
and the solutions to the nonhomogeneous systems. 

Theorem 9.2.13 Suppose x p is a solution to the linear system, 

Tx = b 

Then if y is any other solution, there exists x G ker (T) such that 

y = x p + x. 
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Proof: Consider y - x p = y+ (-1) x p . Then T (y - x p ) = Ty - Tx p = b - b = 0. Let 



Sometimes people remember the above theorem in the following form. The solutions to the 
nonhomogeneous system, Tx = b are given by x p + ker (T) where x p is a particular solution to 
Tx = b. 

I have been vague about what T is and what x is on purpose. This theorem is completely 
algebraic in nature and will work whenever you have linear transformations. In particular, it will be 
important in differential equations. For now, here is a familiar example. 

Example 9.2.14 Let 




Find ker (A). Equivalently, find the solution space to the system of equations Ax = 0. 

This asks you to find {x : Ax = 0} . In other words you are asked to solve the system, Ax = 0. 
Let x = (x, ?/, z, w) . Then this amounts to solving 



1 2 3 \ 

2 112 

4 5 7 2/ 


( X \ 

y 




x + 2y + 3z = 


2x + y + z + 2w = 


4x + by + 7, 


z + 2w 


= 




This is the linear system 



and you know how to solve this using row operations, (Gauss Elimination). Set up the augmented 
matrix 

' 1 2 3 | 

2 1 1 2 | 

4 5 7 2 | 

Then row reduce to obtain the row reduced echelon form, 

1 I -I I 



V I 0/ 
This yields x — \z — \w and y = |w — \z. Thus ker (A) consists of vectors of the form, 



(\- 



z 
w 



w \ 

S -7 




( 5 ^ 

5 




2 




= z 


3 

{ o J 


+ W 


3 


V i / 
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Example 9.2.15 The general solution of a linear system of equations is just the set of all solu- 
tions. Find the general solution to the linear system, 



12 3 
2 112 
4 5 7 2 



y 

\w J 



9 
7 
25 



given that (112 1)=(x?/zk;) is one solution. 

Note the matrix on the left is the same as the matrix in Example 9.2.14. Therefore, from Theorem 
9.2.13, you will obtain all solutions to the above linear system in the form 



/ § \ 



V o ) 



o + 2 

V i ) W 



A^cHB TfJ - 






-cf-- 



-OA - ***»^ 



/U«*/.«- -\) SCrChe 



M/ /^Jco^^^f^^) 



ft*4& 



<$OOflOO 



AA*t) 







The D. E. Shaw group is hiring. 
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9.3 Exercises 

1. Study the definition of a linear transformation. State it from memory. 

2. Show the map T : R n \-> R m defined by T (x) = Ax where A is an m x n matrix and x is an 
m x 1 column vector is a linear transformation. 

3. Find the matrix for the linear transformation which rotates every vector in R 2 through an 
angle of tt/3. 

4. Find the matrix for the linear transformation which rotates every vector in R 2 through an 
angle of 7r/4. 

5. Find the matrix for the linear transformation which rotates every vector in R 2 through an 
angle of — tt/3. 

6. Find the matrix for the linear transformation which rotates every vector in R 2 through an 
angle of 2tt/3. 

7. Find the matrix for the linear transformation which rotates every vector in R 2 through an 
angle of tt/12. Hint: Note that tt/12 = tt/3 - tt/4. 

8. Find the matrix for the linear transformation which rotates every vector in R 2 through an 
angle of 27r/3 and then reflects across the x axis. 

9. Find the matrix for the linear transformation which rotates every vector in R 2 through an 
angle of 7r/3 and then reflects across the x axis. 

10. Find the matrix for the linear transformation which rotates every vector in R 2 through an 
angle of 7r/4 and then reflects across the x axis. 

11. Find the matrix for the linear transformation which rotates every vector in R 2 through an 
angle of 7r/6 and then reflects across the x axis followed by a reflection across the y axis. 

12. Find the matrix for the linear transformation which reflects every vector in R 2 across the x 
axis and then rotates every vector through an angle of 7r/4. 

13. Find the matrix for the linear transformation which reflects every vector in R 2 across the y 
axis and then rotates every vector through an angle of 7r/4. 

14. Find the matrix for the linear transformation which reflects every vector in R 2 across the x 
axis and then rotates every vector through an angle of ir/6. 



15. Find the matrix for the linear transformation which reflects every vector in 
axis and then rotates every vector through an angle of ir/6. 



across the y 



16. Find the matrix for the linear transformation which rotates every vector in R 2 through an 
angle of 5tt/12. Hint: Note that 5tt/12 = 2tt/3 - tt/4. 

17. Find the matrix of the linear transformation which rotates every vector in R 3 counter clockwise 
about the z axis when viewed from the positive z axis through an angle of 30° and then reflects 
through the xy plane. 



18. Find the matrix for proj u (v) where u = (1, —2, 3) 
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19. Find the matrix for proj u (v) where u = (1, 5, 3) . 

20. Find the matrix for proj u (v) where u = (1, 0, 3) . 

21. Show that the function T u defined by T u (v) = v — proj u (v) is also a linear transformation. 

22. Show that 

(v - proj u (v) , u) = (v - proj u (v) , u) = (v - proj u (v)) • u = 

and conclude every vector in R n can be written as the sum of two vectors, one which is 
perpendicular and one which is parallel to the given vector. 

23. Here are some descriptions of functions mapping R n to R n . 

(a) T multiplies the j th component of x by a nonzero number b. 

(b) T replaces the i th component of x with b times the j th component added to the i th 
component. 

(c) T switches two components. 

Show these functions are linear and describe their matrices. 

24. In Problem 23, sketch the effects of the linear transformations on the unit square in R 2 . Give 
a geometric description of an arbitrary invertible matrix in terms of products of matrices of 
these special matrices in Problem 23. 

25. Let u = (a, b) be a unit vector in R 2 . Find the matrix which reflects all vectors across this 
vector. 




Hint: You might want to notice that (a, b) = (cos#, sintf) for some 0. First rotate through — 0. 
Next reflect through the x axis which is easy. Finally rotate through 6. 

26. Let u be a unit vector. Show the linear transformation of the matrix I — 2uu T preserves all 
distances and satisfies 

(7-2uu T ) T (/-2uu T ) = 1. 

This matrix is called a Householder reflection. More generally, any matrix Q which satisfies 
Q T Q = QQ T is called an orthogonal matrix. Show the linear transformation determined by 
an orthogonal matrix always preserves the length of a vector in R n . Hint: First either recall, 
depending on whether you have done Problem 51 on Page 124, or show that for any matrix A, 

{Ax,y) = (x,A T y) 

27. Suppose |x| = |y| for x, y G R n . The problem is to find an orthogonal transformation Q, (see 
Problem 26) which has the property that Qx = y and Qy = x. Show 

l x -y| 

does what is desired. 

28. Let a be a fixed vector. The function T a defined by T a v = a + v has the effect of translating 
all vectors by adding a. Show this is not a linear transformation. Explain why it is not possible 
to realize T a in R 3 by multiplying by a 3 x 3 matrix. 
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29. In spite of Problem 28 we can represent both translations and rotations by matrix multiplica- 
tion at the expense of using higher dimensions. This is done by the homogeneous coordinates. 



I will illustrate in R 3 where most interest in this is found. For each vector v 
consider the vector in R 4 (vi, v^, v%, 1) . What happens when you do 



0l,^2,^3) 



/ 1 ai \ / v 1 \ 

1 (22 V2 

1 as vs 

\o o o i ; V i j 

Describe how to consider both rotations and translations all at once by forming appropriate 
4x4 matrices. 

30. Write the solution set of the following system as the span of vectors and find a basis for the 
solution space of the following system. 




31. Using Problem 30 find the general solution to the following linear system. 





32. Write the solution set of the following system as the span of vectors and find a basis for the 
solution space of the following system. 




33. Using Problem 32 find the general solution to the following linear system. 



1 



1 




34. Write the solution set of the following system as the span of vectors and find a basis for the 
solution space of the following system. 
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35. Using Problem 34 find the general solution to the following linear system. 



-1 2 
-2 
-4 4 



x 

y 



l 

2 
4 



36. Write the solution set of the following system as the span of vectors and find a basis for the 
solution space of the following system. 



0-12 
1 1 
1 -2 5 



x 

y 



37. Using Problem 36 find the general solution to the following linear system. 

1 



0-12 
1 1 
1 -2 5 



x 

y 

z 



-1 



1 




m 
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38. Write the solution set of the following system as the span of vectors and find a basis for the 
solution space of the following system. 

( 1 1 1 \ / x \ ( 0\ 

1-110 y 

3-132 z ~ 

\S 3 3 / \ w J \ J 

39. Using Problem 38 find the general solution to the following linear system. 

( 1 1 1 \ / x \ ( l\ 

1-110 y 2 

3-132 z ~ 4 

\S 3 3 J \ w J \ 3 J 

40. Write the solution set of the following system as the span of vectors and find a basis for the 
solution space of the following system. 

( 1 1 1 \ / x \ ( 0\ 

2 112 y _ 

10 11 z ~ 

\0 0/\^/ \ J 

41. Using Problem 40 find the general solution to the following linear system. 



/ 1 1 1 \ 


( x \ 




( 2 \ 


2 112 




y 




-1 


10 11 




z 




-3 


V -1 1 1 ) 




\w ) 




1 o J 



42. Give an example of a 3 x 2 matrix with the property that the linear transformation determined 
by this matrix is one to one but not onto. 

43. Write the solution set of the following system as the span of vectors and find a basis for the 
solution space of the following system. 



( 1 1 1 \ / x \ 

1-110 y 

3 112 z 

\ 3 3 03/\w/ 






v o y 



44. Using Problem 43 find the general solution to the following linear system. 

(I 1 1 \ / x \ ( l\ 

1-110 y 2 

3 112 z ~ 4 

\3 3 3 / \ w J \ 3 J 

45. Write the solution set of the following system as the span of vectors and find a basis for the 
solution space of the following system. 



/ 1 1 1 \ 




( X ) 




f°\ 


2 112 




y 







10 11 




z 







V -1 1 1 j 




\w ) 




w 
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46. Using Problem 45 find the general solution to the following linear system. 



/i 

2 

1 



1\ 



2 
1 

1/ 



( x\ 

y 
\w J 



2 \ 
-1 
-3 

1 



47. Find ker (A) for 



/ 1 2 3 2 1 \ 

2 112 

14 4 3 3 " 

\ 2 1 1 2 / 

Recall ker (A) is just the set of solutions to Ax = 0. It is the solution space to the system 
Ax = 0. 

48. Using Problem 47, find the general solution to the following linear system. 



(I 


2 


3 


2 


1\ 





2 


1 


1 


2 


1 


4 


4 


3 


3 


Vo 


2 


1 


1 


2/ 



a; 2 
\x 5 J 



7 

18 
V 7 / 



49. Using Problem 47, find the general solution to the following linear system. 



(I 


2 


3 


2 


1\ 





2 


1 


1 


2 


1 


4 


4 


3 


3 


Vo 


2 


1 


1 


2/ 



x 2 

V x 5 y 



/ 6 \ 

7 
13 



50. Suppose Ax = b has a solution. Explain why the solution is unique precisely when Ax = 
has only the trivial (zero) solution. 

51. Show that if A is an m x n matrix, then ker (A) is a subspace. 

52. Verify the linear transformation determined by the matrix of 9.2 maps R 3 onto R 2 but the 
linear transformation determined by this matrix is not one to one. 
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10.1 Definition Of An LU factorization 

An LU factorization of a matrix involves writing the given matrix as the product of a lower triangular 
matrix which has the main diagonal consisting entirely of ones L, and an upper triangular matrix U 
in the indicated order. This is the version discussed here but it is sometimes the case that the L has 
numbers other than 1 down the main diagonal. It is still a useful concept. The L goes with "lower" 
and the U with "upper". It turns out many matrices can be written in this way and when this is 
possible, people get excited about slick ways of solving the system of equations, Ax = y. It is for 
this reason that you want to study the LU factorization. It allows you to work only with triangular 
matrices. It turns out that it takes about half as many operations to obtain an LU factorization as 
it does to find the row reduced echelon form. 

First it should be noted not all matrices have an LU factorization and so we will emphasize the 
techniques for achieving it rather than formal proofs. 

Example 10.1.1 Can you write I \ in the form LU as just described? 

To do so you would need 



a b 
c 



a b 

xa xb + c 



1 

1 
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Therefore, b = 1 and a = 0. Also, from the bottom rows, xa = 1 which can't happen and have 
a = 0. Therefore, you can't write this matrix in the form LU. It has no LU factorization. This is 
what we mean above by saying the method lacks generality. 

10.2 Finding An LU Factorization By Inspection 

Which matrices have anL[/ factorization? It turns out it is those whose row reduced echelon form 
can be achieved without switching rows and which only involve row operations of type 3 in which 
row j is replaced with a multiple of row i added to row j for i < j. 



Example 10.2.1 Find an LU factorization of A = 

One way to find the LU factorization is to simply look for it directly. You need 
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Then multiplying these you get 



a 



d h 




xa xd + b xh + e 

ya yd + zb yh + ze + c 

and so you can now tell what the various quantities equal. From the first column, you need a = 1, x = 
1, y = 2. Now go to the second column. You need d = 2, xd + b = 3 so b = 1, yd + zb = 3 so z = — 1. 
From the third column, h — 0, e = 2, c = 6. Now from the fourth column, j = 2,i = — 1,/ = —5. 
Therefore, anL[/ factorization is 



1 





\ 


/ 1 2 


2 


1 


1 


° 


12 


-1 


2 


-1 


1 / 


V 6 


-5 



You can check whether you got it right by simply multiplying these two. 

10.3 Using Multipliers To Find An LU Factorization 

There is also a convenient procedure for finding an LU factorization. It turns out that it is only 
necessary to keep track of the multipliers which are used to row reduce to upper triangular form. 
This procedure is described in the following examples. 



Example 10.3.1 Find an LU factorization for A = 

Write the matrix next to the identity matrix as shown. 





The process involves doing row operations to the matrix on the right while simultaneously updating 
successive columns of the matrix on the left. First take —2 times the first row and add to the second 
in the matrix on the right. 

' 1 \ / 1 2 3 
2 1 J I —3 —10 

1 J \ 1 5 2 

Note the way we updated the matrix on the left. We put a 2 in the second entry of the first column 
because we used —2 times the first row added to the second row. Now replace the third row in the 
matrix on the right by —1 times the first row added to the third. Notice that the product of the 
two matrices is unchanged and equals the original matrix. This is because a row operation was done 
on the original matrix to get the matrix on the right and then on the left, it was multiplied by an 
elementary matrix which "undid" the row operation which was done. 
The next step is 

' 1 \ / 1 2 3 
2 1 -3 -10 

1 1 / \ 3 -1 

Again, the product is unchanged because we just did and then undid a row operation. Finally, we 
will add the second row to the bottom row and make the following changes 



1 





°\ 


/I 
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1 


° 
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-3 


-10 


1 


-1 


1 / 


I o 





-11 
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At this point, we stop because the matrix on the right is upper triangular. An LU factorization is 
the above. 

The justification for this gimmick will be given later in a more general context. 



Example 10.3.2 Find an LU factorization for A 



( 1 2 1 2 1 \ 

2 2 11 

2 3 13 2 

\ 1 1 1 2 J 



► ► 



We will use the same procedure as above. However, this time we will do everything for one 
column at a time. First multiply the first row by (—1) and then add to the last row. Next take (—2) 
times the first and add to the second and then (—2) times the first and add to the third. 



/ 1 \ 




/I 


2 


1 


2 


1 \ 


2 10 







-4 





-3 


-1 


2 10 







-1 


-1 


-1 





\ 1 1 / V 


-2 





-1 


1 / 



This finishes the first column of L and the first column of U. As in the above, what happened was 
this. Lots of row operations were done and then these were undone by multiplying by the matrix 
on the left. Thus the above product equals the original matrix. Now take — (1/4) times the second 
row in the matrix on the right and add to the third followed by — (1/2) times the second added to 
the last. 



/ 1 \ 




/ 1 2 1 


2 1 \ 


2 10 




0-4 


-3 -1 


2 1/4 10 




0-1 


-1/4 1/4 


V 1 1/2 1 ) 




\ 


1/2 3/2 / 



This finishes the second column of L as well as the second column of U. Since the matrix on the 
right is upper triangular, stop. The LU factorization has now been obtained. This technique is 
called Dolittle's method. 

This process is entirely typical of the general case. The matrix U is just the first upper triangular 
matrix you come to in your quest for the row reduced echelon form using only the row operation 
which involves replacing a row by itself added to a multiple of another row. The matrix L is what 
you get by updating the identity matrix as illustrated above. 

You should note that for a square matrix, the number of row operations necessary to reduce to 
LU form is about half the number needed to place the matrix in row reduced echelon form. This is 
why an LU factorization is of interest in solving systems of equations. 

10.4 Solving Systems Using The LU Factorization 

One reason people care about the LU factorization is it allows the quick solution of systems of 
equations. Here is an example. 

Example 10.4.1 Suppose you want to find the solutions to 

( x\ 

y 

z 
\w J 

Of course one way is to write the augmented matrix and grind away. However, this involves 
more row operations than the computation of the LU factorization and it turns out that the LU 
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factorization can give the solution quickly. Here is how. The following is an LU factorization for 
the matrix. 

' 1 2 3 2 \ / 1 \ / 1 2 3 2 

4 3 1 1 J = I 4 1 J I -5 -11 -7 

1230/ \101/\0 -2 

Let Z7x = y and consider Ly = b where in this case, b = (1, 2, 3) . Thus 



ys 



= I 2 



which yields very quickly that y = 



Now you can find x by solving £7x = y. Thus in this 
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-11 


-7 
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\w J 
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which yields 



h\ 



llv 



V 



,te. 



1 / 



10.5 Justification For The Multiplier Method 

Why does the multiplier method work for finding the LU factorization? Suppose A is a matrix 
which has the property that the row reduced echelon form for A may be achieved using only the 
row operations which involve replacing a row with itself added to a multiple of another row. It is 
not ever necessary to switch rows. Thus every row which is replaced using this row operation in 
obtaining the echelon form may be modified by using a row which is above it. 

Lemma 10.5.1 Let L be a lower (upper) triangular matrix m x m which has ones down the main 
diagonal. Then L~ x also is a lower (upper) triangular matrix which has ones down the main diagonal. 
Also L~ x is obtained from L by simply multiplying each entry below the main diagonal in L with — 1. 

Proof: Consider the usual setup for finding the inverse ( L I ) . Then each row operation 
done to L to reduce to row reduced echelon form results in changing only the entries in / below the 
main diagonal and also the resulting entry on the right of the above m x 2m matrix below the main 
diagonal is just —1 times the corresponding entry in L. ■ 

Now let A be an m x n matrix, say 



A 



( an 

Q>21 



Q>12 

a 2 2 



\ a n 



,i a m 2 



din \ 

&2n 



and assume A can be row reduced to an upper triangular form using only row operation 3. Thus, 
in particular, an ^ 0. Multiply on the left by E\ = 



/ i 



021 

an 



V 







1/ 



This is the product of elementary matrices which make modifications in the first column only. It is 
equivalent to taking — a2i/an times the first row and adding to the second. Then taking —asi/an 
times the first row and adding to the third and so forth. The quotients in the first column of the 
above matrix are the multipliers. Thus the result is of the form 



E X A 



I an 




V o 



ai2 

a 22 



a 'ln \ 



l 2n 



J 



By assumption, a 22 7^ and so it is possible to use this entry to zero out all the entries below it in 
the matrix on the right by multiplication by a matrix of the form E2 = ( n ^1 where E is an 
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(ra — 1) x (ra — 1) matrix of the form 



/ 1 



V 



\ 

o 



i 



Again, the entries in the first column below the 1 are the multipliers. Continuing this way, zeroing 
out the entries below the diagonal entries, finally leads to 

Em-iE n _2 ' ' ' EiA = U 

where U is upper triangular. Each Ej has all ones down the main diagonal and is lower triangular. 
Now multiply both sides by the inverses of the Ej in the reverse order. This yields 

A = E^ 1 E^...E- 1 _ 1 U 

By Lemma 10.5.1, this implies that the product of those EJ 1 is a lower triangular matrix having all 
ones down the main diagonal. 

The above discussion and lemma gives the justification for the multiplier method. The expressions 

— a2i/an, — &3i/ a ii? • • • — a m i/an 

denoted respectively by 777,21, • • • ? w m i to save notation which were obtained in building E\ are the 
multipliers. . Then according to the lemma, to find E^ 1 you simply write 



/ 1 ° 

-77721 1 



\ 





\ -777 m i ••• 1 / 

Similar considerations apply to the other Ej 1 . Thus L is of the form 



/ 



1 
-m 2 i 

- m (m-l)l 

-m ml 



\ 




1 
1 / 



/ 1 

1 

: 

: 

\ 



/ 1 






1 




-m 32 





-m rn2 










\ 




1 
1 / 



1 

-Tfi rnrn —\ 1 I 



It follows from Theorem 8.1.6 about the effect of multiplying on the left by an elementary matrix 
that the above product is of the form 

/ 



1 








°\ 


-m 2 i 


1 

-77732 "■ 








Wl(m-l)l 




1 





-m ml 


-777 m2 • • 


Tflynm—1 


1 I 
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In words, beginning at the left column and moving toward the right, you simply insert, into the 
corresponding position in the identity matrix, —1 times the multiplier which was used to zero out 
an entry in that position below the main diagonal in A, while retaining the main diagonal which 
consists entirely of ones. This is L. 

10.6 The PLU Factorization 

As indicated above, some matrices don't have an LU factorization. Here is an example. 



M= 1 2 3 (10.1) 




In this case, there is another factorization which is useful called a PLU factorization. Here P is a 
permutation matrix. 

Example 10.6.1 Find a PLU factorization for the above matrix in 10.1. 

Proceed as before trying to find the row echelon form of the matrix. First add —1 times the first 
row to the second row and then add —4 times the first to the third. This yields 



1 \ 


/I 
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110 
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-2 


4 1/ 


I o 


-5 


-11 


-7 



There is no way to do only row operations involving replacing a row with itself added to a multiple 
of another row to the matrix on the right in such a way as to obtain an upper triangular matrix. 
Therefore, consider the original matrix with the bottom two rows switched. 

M' = 




1 
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/I 


2 
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o I 


V 4 
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1 


1 



Now try again with this matrix. First take —1 times the first row and add to the bottom row and 
then take —4 times the first row and add to the second row. This yields 

12 3 2 

-5 -11 -7 
0-2 

The matrix on the right is upper triangular and so the LU factorization of the matrix M' has been 
obtained above. 

Thus M' = PM = LU where L and U are given above. Notice that P 2 = I and therefore, 
M = P 2 M = PLU and so 






This process can always be followed and so there always exists a PLU factorization of a given 
matrix even though there isn't always an LU factorization. 

12 3 2 
Example 10.6.2 Use the PLU factorization of M = I 1 2 3 I to solve the system Mx = b 



4 3 11 



where b = (1,2,3) 



T 
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Let UyL = y and consider PLy = b. In other words, solve, 





Multiplying both sides by P gives 





and so 
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Now £7x = y and so it only remains to solve 

12 3 2 

-5 -11 -7 
0-2 



x 2 

x 3 

\x A ) 



which yields 



X-2 

X3 

\X 4 J 



I \ + ¥ \ 



V 



_9 11 j 

10 5 L 

t 

1 
"2 



:te 



10.7 The QR Factorization 

As pointed out above, the LU factorization is not a mathematically respectable thing because it 
does not always exist. There is another factorization which does always exist. Much more can be 
said about it than I will say here. I will only deal with real matrices and so the dot product will be 
the usual real dot product. 

Definition 10.7.1 An n x n real matrix Q is called an orthogonal matrix if 

QQ T = Q T Q = L 

Thus an orthogonal matrix is one whose inverse is equal to its transpose. 
First note that if a matrix is orthogonal this says 



Thus 



CisX s^cirXr 



iqx| 2 = e Eto =EEY,i 

i \ j J i r s 

— / / / ^cis^cirX sXr — / / / ^cis^cirX sXr 

i r s r s i 

= > > sr X s X r = > X r = |X| 
r s r 

This shows that orthogonal transformations preserve distances. You can show that if you have a 
matrix which does preserve distances, then it must be orthogonal also. 

Example 10.7.2 One of the most important examples of an orthogonal matrix is the so called 
Householder matrix. You have v a unit vector and you form the matrix 

I - 2vv T 

This is an orthogonal matrix which is also symmetric. To see this, you use the rules of matrix 
operations. 



(I - 2vv 7 



(2W 1 



I - 2W" 
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so it is symmetric. Now to show it is orthogonal, 

(I - 2vv T ) (I - 2vv T ) = I - 2vv T - 2vv T + 4vv T vv T 

= / - 4vv T + 4vv T = / 

because v T v = v • v = |v| = 1. Therefore, this is an example of an orthogonal matrix. 

Consider the following problem. 

Problem 10.7.3 Given two vectors x,y such that |x| = |y| ^ but x/y and you want an 
orthogonal matrix Q such that Qx = y and Qy = x. The thing which works is the Householder 
matrix 

|x-y| 



Here is why this works. 



Q(x-y) = ( x -y)-2^^(x-y) T (x-y) 

|x-y| 

= (x-y)-2^^|x-y| 2 = y -x 

|x-y| 



'(x + y) = (x + y)-2 | X y (x - y) T (x + y) 



|x- 


|2 

-y| 


X 


-y 


|x- 


|2 

-y| 


X 


-y 



= (x + y) -2- ^((x- y )-(x + y)) 

|x-y| 

= (x + y)-2^^(|x| 2 -|y| 2 )=x + y 



|x-y 
Hence 

Qx + Qy = x + y 
Qx - Qy = y - x 

Adding these equations, 2Qx = 2y and subtracting them yields 2Qy = 2x. 

A picture of the geometric significance follows. 




The orthogonal matrix Q reflects across the dotted line taking x to y and y to x. 

Definition 10.7.4 Let A be anmxn matrix. Then a Q R factorization of A consists of two matrices, 
Q orthogonal and R upper triangular or in other words equal to zero below the main diagonal such 
that A = QR. 
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With the solution to this simple problem, here is how to obtain a QR factorization for any matrix 
A Let 

A = (ai,a 2 ,--- A) 
where the a^ are the columns. If ai = 0, let Qi = /. If ai ^ 0, let 

/ |ai| \ 




V o J 



and form the Householder matrix 



Q 1 =I-2^V(z 1 -bf 



|ai-H 



As in the above problem Q 1 ai = b and so 

QiA 



|ai| * 
A 2 



where ^isam — lxn — 1 matrix. Now find in the same way as was just done an — lxn — 1 

matrix Q2 such that 

^ a ( * * 
Q2A2-- 



Let 



Then 



A 3 



1 
Q 2 



Q2Q1A 



1 



Q2 

|ai| * * \ 

: * * 

A 3 J 



ai| * 
A 2 
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Continuing this way until the result is upper triangular, you get a sequence of orthogonal matrices 
QpQp-i — 'Qi such that 

Q P Qp-i---QiA = R (10.2) 

where R is upper triangular. 

Now if Qi and Q2 are orthogonal, then from properties of matrix multiplication, 

Q1Q2 {QiQ2) T = Q1Q2Q2QI = QiiQl = 1 

and similarly 

(QiQ2) T QiQ2 = l. 

Thus the product of orthogonal matrices is orthogonal. Also the transpose of an orthogonal matrix 
is orthogonal directly from the definition. Therefore, from 10.2 

A = {QpQp-i ■ ■ ' Qi) R — QR-, 

where Q is orthogonal. This proves the following theorem. 

Theorem 10.7.5 Let A be any real m x n matrix. Then there exists an orthogonal matrix Q and 
an upper triangular matrix R having nonnegative entries down the main diagonal such that 

A = QR 




www.im^rith-zf 
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and this factorization can be accomplished in a systematic manner. 



► ► 



10.8 Exercises 

12 

1. Find an LU factorization of | 2 1 3 

1 2 3 

12 3 2 

2. Find an LU factorization of | 1 3 2 1 

5 13 



3. Find an L[/ factorization of the matrix 



1-2-5 

-2 5 11 3 

3 -6 -15 1 



4. Find an LU factorization of the matrix 



5. Find anL[/ factorization of the matrix 



1 -1 -3 -1 
-12 4 3 

2 -3 -7 -3 

1 -3 -4 -3 

-3 10 10 10 

1-6 2-5 



13 1-1 
6. Find an LU factorization of the matrix | 3 10 8 — 1 

2 5-3-3 



7. Find an LU factorization of the matrix 



8. Find an L(7 factorization of the matrix 



/ 3 -2 1 \ 

9-8 6 

-6 2 2 

\ 3 2 -7 J 

( -3 -1 3 \ 

9 9 -12 

3 19 -16 

V 12 40 -26 j 



9. Find an LU factorization of the matrix 



/-I -3 -1\ 

1 3 

3 9 

\ A 12 16 J 

10. Find the LU factorization of the coefficient matrix using Dolittle's method and use it to solve 
the system of equations. 

x + 2y = 5 
2x + 3y = 6 

11. Find the LU factorization of the coefficient matrix using Dolittle's method and use it to solve 
the system of equations. 

x + 2y + z = 1 
y + 3z = 2 
2x + 3y = 6 
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12. Find the LU factorization of the coefficient matrix using Dolittle's method and use it to solve 
the system of equations. 

x + 2y + 3z = 5 

2x + 3y + z = 6 

x — 2/ + z = 2 

13. Find the LU factorization of the coefficient matrix using Dolittle's method and use it to solve 
the system of equations. 

x + 2y + 3z = 5 
2x + 3 2/ + z = 6 

3x + 5?/ + 4z = 11 

14. Is there only one LU factorization for a given matrix? Hint: Consider the equation 



1 
1 



Look for all possible LU factorizations. 

15. Find a PLU factorization of 

16. Find a PLU factorization of 



1 
1 1 



1 




17. Find a PLU factorization of 



18. Find a PLU factorization of 



and use it to solve the systems 



(a) 



(b) 



/I 


2 


1\ 


2 


4 


1 


1 





2 


\2 


2 


1/ 


/I 


2 


1\ 


2 


4 


1 


1 





2 


V2 


2 


1/ 




19. Find a PLU factorization of 



and use it to solve the systems 
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(a) 



(b) 
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z 
\w J 
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1 
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2 
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20. Find a QR factorization for the matrix 



1 2 1 
3 -2 1 
1 2 
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21. Find a QR factorization for the matrix 
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22. If you had a QR factorization, A = QR, describe how you could use it to solve the equation 
Ax = b. This is not usually the way people solve this equation. However, the QR factor- 
ization is of great importance in certain other problems, especially in finding eigenvalues and 
eigenvectors. 
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11.1 Simple Geometric Considerations 

One of the most important uses of row operations is in solving linear program problems which involve 
maximizing a linear function subject to inequality constraints determined from linear equations. 
Here is an example. A certain hamburger store has 9000 hamburger patties to use in one week 
and a limitless supply of special sauce, lettuce, tomatoes, onions, and buns. They sell two types of 
hamburgers, the big stack and the basic burger. It has also been determined that the employees 
cannot prepare more than 9000 of either type in one week. The big stack, popular with the teenagers 
from the local high school, involves two patties, lots of delicious sauce, condiments galore, and a 
divider between the two patties. The basic burger, very popular with children, involves only one 
patty and some pickles and ketchup. Demand for the basic burger is twice what it is for the big 
stack. What is the maximum number of hamburgers which could be sold in one week given the 
above limitations? 

Let x be the number of basic burgers and y the number of big stacks which could be sold in 
a week. Thus it is desired to maximize z = x + y subject to the above constraints. The total 
number of patties is 9000 and so the number of patty used is x + 2y. This number must satisfy 
x + 2y < 9000 because there are only 9000 patty available. Because of the limitation on the number 
the employees can prepare and the demand, it follows 2x + y < 9000. You never sell a negative 
number of hamburgers and so x, y > 0. In simpler terms the problem reduces to maximizing z = x-\-y 
subject to the two constraints, x + 2y < 9000 and 2x + y < 9000. This problem is pretty easy to 
solve geometrically. Consider the following picture in which R labels the region described by the 
above inequalities and the line z = x + y is shown for a particular value of z. 



x + y = z 




As you make z larger this line moves away from the origin, always having the same slope and 
the desired solution would consist of a point in the region, R which makes z as large as possible or 
equivalently one for which the line is as far as possible from the origin. Clearly this point is the point 
of intersection of the two lines, (3000, 3000) and so the maximum value of the given function is 6000. 
Of course this type of procedure is fine for a situation in which there are only two variables but what 



Download free eBooks at bookboon.com 



249 



Elementary Linear Algebra Linear Programming 

about a similar problem in which there are very many variables. In reality, this hamburger store 
makes many more types of burgers than those two and there are many considerations other than 
demand and available patty. Each will likely give you a constraint which must be considered in order 
to solve a more realistic problem and the end result will likely be a problem in many dimensions, 
probably many more than three so your ability to draw a picture will get you nowhere for such a 
problem. Another method is needed. This method is the topic of this section. I will illustrate with 
this particular problem. Let x\ = x and y = x^. Also let x^ and X4 be nonnegative variables such 
that 

xi + 2x 2 + x 3 = 9000, 2x! +x 2 +x A = 9000. 

To say that x% and £4 are nonnegative is the same as saying x\ + 2^2 < 9000 and 2a; 1 + £2 < 9000 
and these variables are called slack variables at this point. They are called this because they "take 
up the slack" . I will discuss these more later. First a general situation is considered. 

11.2 The Simplex Tableau 

Here is some notation. 

Definition 11.2.1 Let x, y be vectors in R q . Then x < y means for each i,Xi < jji. 

The problem is as follows: 

Let A be an m x (m + n) real matrix of rank m. It is desired to find x E R n+m such that x 
satisfies the constraints, 

x>0, Ax = b (11.1) 

and out of all such x, 

ra+n 

z ^ y c-iXi 

2=1 

is as large (or small) as possible. This is usually referred to as maximizing or minimizing z subject 
to the above constraints. First I will consider the constraints. 
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Let A = ( ai • • • a n+m ) . First you find a vector x°> 0, Ax°= b such that n of the 
components of this vector equal 0. Letting z 1? • • • , i n be the positions of x° for which x® = 0, suppose 
also that {a^, • • • , a Jm } is linearly independent for j^ the other positions of x°. Geometrically, this 
means that x° is a corner of the feasible region, those x which satisfy the constraints. This is called 
a basic feasible solution. Also define 






( c ii-- 



J, c F = (c hr 



J 



[Xj 1 , • • • , Xj m ) , x^ _ [Xi 1 , • • • , Xi n ) . 



and 



z° = z (x°) = ( c B C F 



^B _ 



CBX-B 



since x 



o _ 



0. The variables which are the components of the vector x# are called the basic 
variables and the variables which are the entries of Xi? are called the free variables. You set 
Xi? = 0. Now (x°, z°) is a solution to 
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along with the constraints x > 0. Writing the above in augmented matrix form yields 

-c ; o ) <"*> 

Permute the columns and variables on the left if necessary to write the above in the form 

b 




o < ( 1L3) 

or equivalently in the augmented matrix form keeping track of the variables on the bottom as 

(11.4) 




Here B pertains to the variables x^ , • • • , Xj m and is an m x m matrix with linearly independent 
columns, {a^, • • • , a Jrri } , and F is an m x n matrix. Now it is assumed that 

and since B is assumed to have rank m, it follows 

x^^b^O. (11.5) 

This is very important to observe. B~ x h > 0! This is by the assumption that x° > 0. 
Do row operations on the top part of the matrix 

B F ? n ) (H-6) 

-c B -c f 10/ 

and obtain its row reduced echelon form. Then after these row operations the above becomes 

I B~ X F B~ x h 



1 n I ■ ( 1L7 ) 

c B -c F 1 } v 

where B~ x h > 0. Next do another row operation in order to get a where you see a — c#. Thus 

I B~ X F B~ x h 



cbB^F'-cf 1 csB^h ^ l ' 1L( " 

1 B~ X F B~ x h 

CbB^F'-Cf 1 c b x°b 

1 B~ X F B~ x h 
cbB^F-cf 1 z 



i, __ i .o I (H.9) 



The reason there is a z° on the bottom right corner is that x^? = and (x^, x^, z°) is a solution of 
the system of equations represented by the above augmented matrix because it is a solution to the 
system of equations corresponding to the system of equations represented by 11.6 and row operations 
leave solution sets unchanged. Note how attractive this is. The zo is the value of z at the point x°. 
The augmented matrix of 11.9 is called the simplex tableau and it is the beginning point for the 
simplex algorithm to be described a little later. It is very convenient to express the simplex tableau 

I 



in the above form in which the variables are possibly permuted in order to have I I on the left 

side. However, as far as the simplex algorithm is concerned it is not necessary to be permuting the 



Download free eBooks at bookboon.com 

252 



Elementary Linear Algebra Linear Programming 



variables in this manner. Starting with 11.9 you could permute the variables and columns to obtain 
an augmented matrix in which the variables are in their original order. What is really required for 
the simplex tableau? 

It is an augmented m + lxm + n + 2 matrix which represents a system of equations which has 
the same set of solutions, (x,z) as the system whose augmented matrix is 

A b 

-c 1 

(Possibly the variables for x are taken in another order.) There are m linearly independent columns 
in the first m + n columns for which there is only one nonzero entry, a 1 in one of the first m 
rows, the "simple columns", the other first m + n columns being the "nonsimple columns". As in 
the above, the variables corresponding to the simple columns are x#, the basic variables and those 
corresponding to the nonsimple columns are x^, the free variables. Also, the top m entries of the 
last column on the right are nonnegative. This is the description of a simplex tableau. 

In a simplex tableau it is easy to spot a basic feasible solution. You can see one quickly by 
setting the variables, xjc? corresponding to the nonsimple columns equal to zero. Then the other 
variables, corresponding to the simple columns are each equal to a nonnegative entry in the far right 
column. Lets call this an "obvious basic feasible solution" . If a solution is obtained by setting 
the variables corresponding to the nonsimple columns equal to zero and the variables corresponding 
to the simple columns equal to zero this will be referred to as an "obvious" solution. Lets also call 
the first m + n entries in the bottom row the "bottom left row" . In a simplex tableau, the entry 
in the bottom right corner gives the value of the variable being maximized or minimized when the 
obvious basic feasible solution is chosen. 

The following is a special case of the general theory presented above and shows how such a 
special case can be fit into the above framework. The following example is rather typical of the sorts 
of problems considered. It involves inequality constraints instead of Ax = b. This is handled by 
adding in "slack variables" as explained below. 

The idea is to obtain an augmented matrix for the constraints such that obvious solutions are 
also feasible. Then there is an algorithm, to be presented later, which takes you from one obvious 
feasible solution to another until you obtain the maximum. 

Example 11.2.2 Consider z = x\ — x 2 subject to the constraints, x\ + 2^2 < 10, xi + 2x2 > 2, 
and 2x\ + X2 < 6, xi > 0. Find a simplex tableau for a problem of the form x > 0,Ax = b which is 
equivalent to the above problem. 

You add in slack variables. These are positive variables, one for each of the first three constraints, 
which change the first three inequalities into equations. Thus the first three inequalities become 

x\ + 2x2 + xs = 10, #i + 2x2 — x^ = 2, and 2xi + X2 + X5 = 6,xi,X2,X3,X4,X5 > 0. Now it 
is necessary to find a basic feasible solution. You mainly need to find a positive solution to the 
equations, 

xi + 2x 2 + x 3 = 10 

x\ + 2x2 — X4 = 2 . 
2xi + x 2 + x 5 = 6 

the solution set for the above system is given by 

2 2 1 1 10 2 

x 2 = -x 4 - - + -x 5 ,xi = --x 4 + — - -x 5 ,x 3 = -X4 + 8. 

An easy way to get a basic feasible solution is to let X4 = 8 and X5 = 1. Then a feasible solution is 

(xi,X2,x 3 ,X4,x 5 ) = (0,5,0,8,1) . 
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It follows z° 
bottom is 



-5 and the matrix 11.2, 



1 

1 

2 

-1 



2 
2 
1 
1 

x 2 



A 
— c 

1 




x 3 





1 



-1 




X4 



b 






1 



x 5 



with the variables kept track of on the 



10 \ 

2 

6 


/ 



and the first thing to do is to permute the columns so that the list of variables on the bottom will 
have x\ and x 3 at the end. 

/ 2 1 1 10 \ 

2-10 1 2 

10 12 6 

1 0-1010 

y X2 x^ #5 x\ xs o o y 

Next, as described above, take the row reduced echelon form of the top three lines of the above 
matrix. This yields 



Now do row operations to 









1 








1 








1 








1 









1 

2 
1 

_! 

2 



-1 



5\ 
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to finally obtain 



/I 











1 











1 


V° 









1 


1 


2 


2 





1 


3 


1 


2 3 


\ 


2 


2 



5 \ 



1 



-5/ 



and this is a simplex tableau. The variables are x 2l x^x^ 1 Xi 1 x^ 1 z. 

It isn't as hard as it may appear from the above. Lets not permute the variables and simply find 
an acceptable simplex tableau as described above. 



Example 11.2.3 Consider z = x\ — x 2 subject to the constraints, x\ - 
2x\ + X2 < 6,^i > 0. Find a simplex tableau. 



- 2x 2 < 10, xi + 2x 2 > 2, and 



Adding in slack variables, an augmented matrix which is descriptive of the constraints is 



1 2 1 








10 


1 2 


-1 





6 


2 1 





1 


6 



The obvious solution is not feasible because of that -1 in the fourth column. When you let xi, x 2 = 0, 
you end up having x^ = — 6 which is negative. Consider the second column and select the 2 as a 
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pivot to zero out that which is above and below the 2. 



1 


1 4 


\ 1 ° 
loo 


~\ 3 


\ 1 3 



This one is good. When you let x\ = x^ = 0, you find that x^ = 3, X3 = 4, x$ = 3. The obvious 
solution is now feasible. You can now assemble the simplex tableau. The first step is to include a 
column and row for z. This yields 

/ 
V 

Now you need to get zeros in the right places so the simple columns will be preserved as simple 
columns in this larger matrix. This means you need to zero out the 1 in the third column on the 
bottom. A simplex tableau is now 

/ 1 1 4 \ 









1 


1 








4 \ 


1 


1 





1 

2 








3 


2 








1 

2 


1 





3 


-1 





1 








1 


) 









1 


1 








4 


1 


1 





1 

2 








3 


2 








1 

2 


1 





3 


-1 








-1 





1 


-4 



\ -1 -1 1 -4 / 

Note it is not the same one obtained earlier. There is no reason a simplex tableau should be unique. 
In fact, it follows from the above general description that you have one for each basic feasible point 
of the region determined by the constraints. 

11.3 The Simplex Algorithm 

11.3.1 Maximums 

The simplex algorithm takes you from one basic feasible solution to another while maximizing or 
minimizing the function you are trying to maximize or minimize. Algebraically, it takes you from one 
simplex tableau to another in which the lower right corner either increases in the case of maximization 
or decreases in the case of minimization. 

I will continue writing the simplex tableau in such a way that the simple columns having only 
one entry nonzero are on the left. As explained above, this amounts to permuting the variables. I 
will do this because it is possible to describe what is going on without onerous notation. However, 
in the examples, I won't worry so much about it. Thus, from a basic feasible solution, a simplex 
tableau of the following form has been obtained in which the columns for the basic variables, x# are 
listed first and b > 0. 

' I F b 
c 1 



1 (11.10) 



Let x® = hi for i — 1, • • • , m and x® = for i > m. Then (x°, z°) is a solution to the above system 
and since b > 0, it follows (x°, z°) is a basic feasible solution. 

( F \ 

If Ci < for some z, and if Fji < so that a whole column of I I is < with the bottom 

entry < 0, then letting xi be the variable corresponding to that column, you could leave all the other 
entries of x^ equal to zero but change Xi to be positive. Let the new vector be denoted by x^ and 
letting x.' B = b — Fx.' F it follows 

( x b)/c = h-J2 F kj( X F)j 
3 

= h - F ki Xi > 
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Now this shows (x' B , x' F ) is feasible whenever x\ > and so you could let x\ become arbitrarily large 
and positive and conclude there is no maximum for z because 

Z={-Ci)Xi + Z° (11.11) 

If this happens in a simplex tableau, you can say there is no maximum and stop. 

What if c > 0? Then z = z° — cx^ and to satisfy the constraints, you need x^ > 0. Therefore, in 
this case, z° is the largest possible value of z and so the maximum has been found. You stop when 
this occurs. Next I explain what to do if neither of the above stopping conditions hold. 

( F 

The only case which remains is that some q < and some Fji > 0. You pick a column in I 

in which q < 0, usually the one for which q is the largest in absolute value. You pick Fji > as 
a pivot element, divide the j th row by Fji and then use to obtain zeros above Fji and below Fji, 
thus obtaining a new simple column. This row operation also makes exactly one of the other simple 
columns into a nonsimple column. (In terms of variables, it is said that a free variable becomes 
a basic variable and a basic variable becomes a free variable.) Now permuting the columns and 
variables, yields 

I F' b' 

c' 1 z 0f 

where z 0f > z° because z 0f = z° — Ci ( -^ J and q < 0. If b' > 0, you are in the same position 

you were at the beginning but now z° is larger. Now here is the important thing. You don't pick 
just any Fji when you do these row operations. You pick the positive one for which the row 
operation results in b' > 0. Otherwise the obvious basic feasible solution obtained by letting 
x. f F = will fail to satisfy the constraint that x > 0. 
How is this done? You need 



for each k = 1 , • • • , m or equivalently, 



b' k = b k - -f^ > (11.12) 



b k > %^. (11.13) 



Now if F^i < the above holds. Therefore, you only need to check F pi for F pi > 0. The pivot, 
Fji is the one which makes the quotients of the form 



bp_ 
F ■ 

for all positive F pi the smallest. This will work because for F^ > 0, 

b p bk , Fkib p 

\ =?> Oh ^ 

F ■ ~ Fu- F 

1 p% ± ki L pi 

Having gotten a new simplex tableau, you do the same thing to it which was just done and continue. 
As long as b > 0, so you don't encounter the degenerate case, the values for z associated with setting 
Xi? = keep getting strictly larger every time the process is repeated. You keep going until you find 
c > 0. Then you stop. You are at a maximum. Problems can occur in the process in the so called 
degenerate case when at some stage of the process some bj =0. In this case you can cycle through 
different values for x with no improvement in z. This case will not be discussed here. 

Example 11.3.1 Maximize 2x\ +3x2 subject to the constraints xi+x 2 > l,2#i+#2 < 6,xi + 2^2 < 
6 ; £i,#2 > 0- 
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The constraints are of the form 



Xi + x 2 - x 3 


= 


1 


2x 


1 + X 2 + X4 


= 


6 


Xi 


+ 2x 2 + x 5 


= 


6 


Dlee 


>. An augmented n 


1 


1 -1 





1 


2 


1 1 





6 


1 


2 


1 


6 



where the #3, £4, x 5 are the slack variables. An augmented matrix for these equations is of the form 



Obviously the obvious solution is not feasible. It results in X3 < 0. We need to exchange basic 
variables. Lets just try something. 



1 


1 


-1 








1 





-1 


2 


1 





4 





1 


1 





1 


5 



Now this one is all right because the obvious solution is feasible. Letting x 2 = £3 = 0, it follows 
that the obvious solution is feasible. Now we add in the objective function as described above. 



/ 1 1 -1 1 \ 

0-121004 

1 10 10 5 

\ -2 -3 10/ 


,he simple columns the same. T 


/ 1 1 -1 1 \ 

0-1 2 10 4 

1 10 10 5 

\0 -1-2 0012/ 
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Now there are negative numbers on the bottom row to the left of the 1. Lets pick the first. (It 
would be more sensible to pick the second.) The ratios to look at are 5/1,1/1 so pick for the pivot 
the 1 in the second column and first row. This will leave the right column above the lower right 
corner nonnegative. Thus the next tableau is 



/ 



V 



-1 

1 

2 



1\ 

5 
4 

3/ 



There is still a negative number there to the left of the 1 in the bottom row. The new ratios are 
4/2, 5/1 so the new pivot is the 2 in the third column. Thus the next tableau is 



/ 



V 



1 










1 


1 

\ 

2 






3 
3 





2 





1 





4 











3 
o 


1 


9 



\ 



/ 



Still, there is a negative number in the bottom row to the left of the 1 so the process does not stop 
yet. The ratios are 3/ (3/2) and 3/ (1/2) and so the new pivot is that 3/2 in the first column. Thus 



t 
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the new tableau is 



1 














2 









I 2 

3 3_ 





| 1 -\ 

q n o 2 2 






2 





3 





6 


1 


10 



V | § 1 10 ) 

Now stop. The maximum value is 10. This is an easy enough problem to do geometrically and so you 
can easily verify that this is the right answer. It occurs when x^ = x$ = 0, x\ = 2,^2 =2, £3 = 3. 

11.3.2 Minimums 

How does it differ if you are finding a minimum? From a basic feasible solution, a simplex tableau 
of the following form has been obtained in which the simple columns for the basic variables, x# are 
listed first and b > 0. 

I F b 

c 1 



I (11-14) 



Let x® = bi for i = 1, • • • , m and x® = for i > m. Then (x°, z°) is a solution to the above system 
and since b > 0, it follows (x ,z°) is a basic feasible solution. So far, there is no change. 

Suppose first that some q > and Fji < for each j. Then let x^ consist of changing X{ by 
making it positive but leaving the other entries of y^p equal to 0. Then from the bottom row, 

and you let x.' B = b — Fx' F > 0. Thus the constraints continue to hold when x\ is made increasingly 
positive and it follows from the above equation that there is no minimum for z. You stop when this 
happens. 

Next suppose c < 0. Then in this case, z = z° — cx^ and from the constraints, Xi? > and so 
— cx^ > and so z° is the minimum value and you stop since this is what you are looking for. 

What do you do in the case where some q > and some Fji > 0? In this case, you use the simplex 
algorithm as in the case of maximums to obtain a new simplex tableau in which z 0f is smaller. You 
choose Fji the same way to be the positive entry of the i th column such that b p /F pi > bj/Fji for all 
positive entries, F P i and do the same row operations. Now this time, 

«•-"'-«(£)<'• 

As in the case of maximums no problem can occur and the process will converge unless you have the 
degenerate case in which some bj = 0. As in the earlier case, this is most unfortunate when it occurs. 
You see what happens of course. z° does not change and the algorithm just delivers different values 
of the variables forever with no improvement. 

To summarize the geometrical significance of the simplex algorithm, it takes you from one corner 
of the feasible region to another. You go in one direction to find the maximum and in another to 
find the minimum. For the maximum you try to get rid of negative entries of c and for minimums 
you try to eliminate positive entries of c, where the method of elimination involves the auspicious 
use of an appropriate pivot element and row operations. 

Now return to Example 11.2.2. It will be modified to be a maximization problem. 

Example 11.3.2 Maximize z = x\ — x^ subject to the constraints, 

xi + 2x 2 < 10, xi + 2x 2 > 2, 
and 2x\ + X2 < 6, X{ > 0. 
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Recall this is the same as maximizing z = x\ — x 2 subject to 



12 10 
12 0-10 
2 10 1 



/ Xl \ 

x 2 
x 3 

X4 




x>0, 



the variables, £3, £4, £5 being slack variables. Recall the simplex tableau was 

\ 5 \ 



/ 1 

1 

1 

V 



1 

2 



3 

3 

2 



1 



-5/ 



with the variables ordered as ^2,^4,^5,^1,^3 and so x# = (^2,^4,^5) and 

x F = (x u x 3 ). 

Apply the simplex algorithm to the fourth column because — | < and this is the most negative 
entry in the bottom row. The pivot is 3/2 because 1/(3/2) = 2/3 < 5/ (1/2) . Dividing this row by 
3/2 and then using this to zero out the other elements in that column, the new simplex tableau is 

\ 



/I 






1 


1 

3 








^ 

3 U 

1 


14 
3 

8 









2 
3 

1 


1 




-3" 

-1 1 


2 
3 

-4 



J 

Now there is still a negative number in the bottom left row. Therefore, the process should be 
continued. This time the pivot is the 2/3 in the top of the column. Dividing the top row by 2/3 
and then using this to zero out the entries below it, 



( K ° 


-? ° 


1 





7 \ 


\ ° 


i ! 

I 










1 

3 


V § 





1 


3/ 



Now all the numbers on the bottom left row are nonnegative so the process stops. Now recall the 
variables and columns were ordered as #2, #4, #5, #1, #3- The solution in terms of x\ and x 2 is x 2 = 
and x\ = 3 and z = 3. Note that in the above, I did not worry about permuting the columns to 
keep those which go with the basic variables on the left. 
Here is a bucolic example. 

Example 11.3.3 Consider the following table. 





Fi 


F 2 


F3 


F 4 


iron 


1 


2 


1 


3 


protein 


5 


3 


2 


1 


folic acid 


1 


2 


2 


1 


copper 


2 


1 


1 


1 


calcium 


1 


1 


1 


1 



This information is available to a pig farmer and Fi denotes a particular feed. The numbers in the 
table contain the number of units of a particular nutrient contained in one pound of the given , 
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Thus F2 has 2 units of iron in one pound. Now suppose the cost of each feed in cents per pound is 
given in the following table. 



Fi 


F 2 


F 3 


Fi 


2 


3 


2 


3 



A typical pig needs 5 units of iron, 8 of protein, 6 of folic acid, 7 of copper and 4 of calcium. (The 
units may change from nutrient to nutrient.) How many pounds of each feed per pig should the pig 
farmer use in order to minimize his cost? 

His problem is to minimize C = 2x\ + 3x2 + 2x3 + 3x4 subject to the constraints 

x\ + 2^2 + X3 + 3^4 > 5, 

5xi + 3^2 + 2x3 + X4 > 8, 

xi + 2x 2 + 2x 3 + x 4 > 6, 

2xi + x 2 + x 3 + x 4 > 7, 

xi + x 2 + x 3 + x 4 > 4. 

where each X{ > 0. Add in the slack variables, 

Xi + 2X2 + ^3 + 3x4 — x 5 = 5 

5xi + 3X2 + 2x3 + X4 — X6 = 8 

X\ + 2X2 + 2x3 + X4 — X7 

2xi + X2 + X3 + X4 — Xg 

X\ + X2 + X3 + X4 — Xg 



The augmented matrix for this system is 



/ 1 
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-1 
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-1 
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-1 
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-1 
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V 1 
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-1 


4 



How in the world can you find a basic feasible solution? Remember the simplex algorithm is designed 
to keep the entries in the right column nonnegative so you use this algorithm a few times till the 
obvious solution is a basic feasible solution. 

Consider the first column. The pivot is the 5. Using the row operations described in the 
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algorithm, you get 



5 



-1 


1 
5 1 














1 
,5 
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-1 
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-1 

















17 



k 
k 

1 5 2 



Now go to the second column. The pivot in this column is the 7/5. This is in a different row than 
the pivot in the first column so I will use it to zero out everything below it. This will get rid of the 
zeros in the fifth column and introduce zeros in the second. This yields 
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1 
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-1 
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I 1 
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-2 
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2 
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-1 





30 
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Vo 








2 ? 
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-1 


k) 



Now consider another column, this time the fourth. I will pick this one because it has some negative 
numbers in it so there are fewer entries to check in looking for a pivot. Unfortunately, the pivot is 
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the top 2 and I don't want to pivot on this because it would destroy the zeros in the second column. 
Consider the fifth column. It is also not a good choice because the pivot is the second element from 
the top and this would destroy the zeros in the first column. Consider the sixth column. I can use 
either of the two bottom entries as the pivot. The matrix is 



/ 1 2 


-1 
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10 1-1 
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0-2 
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1-2 
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-10 
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0-11 


-1 


0-13 
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2 1 
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10 ) 


olumn. The pivot 


is the 1 in the third row 


. This 


/ 1 2 


-1 
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1 \ 


10 1 





1 -2 
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1-2 
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-10 
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0-1 





-1 -1 3 


1 




\ 6 


-1 1 


3 0-7 


W 





There are still 5 columns which consist entirely of zeros except for one entry. Four of them have 
that entry equal to 1 but one still has a -1 in it, the -1 being in the fourth column. I need to do the 
row operations on a nonsimple column which has the pivot in the fourth row. Such a column is the 
second to the last. The pivot is the 3. The new matrix is 
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(11.15) 



Now the obvious basic solution is feasible. You let x^ = = x$ = xj — x$ and x\ — 8/3, x^ — 
2/3, £3 = 1, and xq = 28/3. You don't need to worry too much about this. It is the above matrix 
which is desired. Now you can assemble the simplex tableau and begin the algorithm. Remember 
C = 2x\ + 3^2 + 2x3 + 3x4. First add the row and column which deal with C. This yields 
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(11.16) 



Now you do row operations to keep the simple columns of 11.15 simple in 11.16. Of course you could 
permute the columns if you wanted but this is not necessary. 

This yields the following for a simplex tableau. Now it is a matter of getting rid of the positive 
entries in the bottom row because you are trying to minimize. 
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The most positive of them is the 2/3 and so I will apply the algorithm to this one first. The pivot 
is the 7/3. After doing the row operation the next tableau is 
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and you see that all the entries are negative and so the minimum is 64/7 and it occurs when 
X! = 18/7,^2 = 0,x 3 = 11/7,^4 = 2/7. 

There is no maximum for the above problem. However, I will pretend I don't know this and 
attempt to use the simplex algorithm. You set up the simiplex tableau the same way. Recall it is 
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Now to maximize, you try to get rid of the negative entries in the bottom left row. The most 
negative entry is the -1 in the fifth column. The pivot is the 1 in the third row of this column. The 
new tableau is 
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Consider the fourth column. The pivot is the top 1/3. The new tableau is 
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There is still a negative in the bottom, the -4. The pivot in that column is the 3. The algorithm 



yields 
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Note how z keeps getting larger. Consider the column having the —13/3 in it. The pivot is the 
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single positive entry, 1/3. The next tableau is 
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There is a column consisting of all negative entries. There is therefore, no maximum. Note also how 
there is no way to pick the pivot in that column. 



Example 11.3.4 Minimize z = x\ — 3^2 + 
X2 + xs > 2, x\ + X2 + 3^3 < 8 and x\ + 2^2 



xs subject to the constraints x\ + X2 + xs < 10, x\ 
+ xs < 7 with all variables nonnegative. 



There exists an answer because the region defined by the constraints is closed and bounded. 
Adding in slack variables you get the following augmented matrix corresponding to the constraints. 

( 1 1 1 1 10 \ 

1110-100 2 

113 10 8 

V 1 2 1 1 7 / 

Of course there is a problem with the obvious solution obtained by setting to zero all variables 
corresponding to a nonsimple column because of the simple column which has the —1 in it. Therefore, 
I will use the simplex algorithm to make this column non simple. The third column has the 1 in the 
second row as the pivot so I will use this column. This yields 



(11.17) 



and the obvious solution is feasible. Now it is time to assemble the simplex tableau. First add in 
the bottom row and second to last column corresponding to the equation for z. This yields 
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Next you need to zero out the entries in the bottom row which are below one of the simple columns 
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in 11.17. This yields the simplex tableau 
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The desire is to minimize this so you need to get rid of the positive entries in the left bottom row. 
There is only one such entry, the 4. In that column the pivot is the 1 in the second row of this 
column. Thus the next tableau is 
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There is still a positive number there, the 3. The pivot in this column is the 2. Apply the algorithm 
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again. This yields 



V 



13 



2 



J 



Now all the entries in the left bottom row are nonpositive so the process has stopped. The minimum 
is —21/2. It occurs when x\ = 0, x 2 = 7/2, £3 = 0. 

Now consider the same problem but change the word, minimize to the word, maximize. 

Example 11.3.5 Maximize z = x\ — 3x 2 + %3 subject to the constraints x\ + X2 + #3 < 10, x\ + 

%2 + %3 > 2, x\ + £2 + 3#3 < 8 and #1 + 2^2 + %3 < 7 wii/i a// variables nonnegative. 

The first part of it is the same. You wind up with the same simplex tableau, 
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but this time, you apply the algorithm to get rid of the negative entries in the left bottom row. 
There is a — 1. Use this column. The pivot is the 3. The next tableau is 

22 \ 

k 

! j 

There is still a negative entry, the —2/3. This will be the new pivot column. The pivot is the 2/3 
on the fourth row. This yields 
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and the process stops. The maximum for z is 7 and it occurs when x\ = 13/2, x 2 = 0, £3 = 1/2. 



11.4 Finding A Basic Feasible Solution 

By now it should be fairly clear that finding a basic feasible solution can create considerable diffi- 
culty. Indeed, given a system of linear inequalities along with the requirement that each variable 
be nonnegative, do there even exist points satisfying all these inequalities? If you have many vari- 
ables, you can't answer this by drawing a picture. Is there some other way to do this which is more 
systematic than what was presented above? The answer is yes. It is called the method of artificial 
variables. I will illustrate this method with an example. 



Example 11.4.1 Find a basic feasible solution to the system 1x\ 

2 1 xi J rX2 J rXz<l and x > 0. 



^2 - ^3 > 3, xi + x 2 + x 3 > 
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If you write the appropriate augmented matrix with the slack variables, 
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(11.18) 



The obvious solution is not feasible. This is why it would be hard to get started with the simplex 
method. What is the problem? It is those —1 entries in the fourth and fifth columns. To get around 
this, you add in artificial variables to get an augmented matrix of the form 



(11.19) 
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Thus the variables are #i, #2? #3? #4? #5? #6? #7? #8- Suppose you can find a feasible solution to the 
system of equations represented by the above augmented matrix. Thus all variables are nonnegative. 
Suppose also that it can be done in such a way that x$ and xj happen to be 0. Then it will follow 
that xi," - ,xq is a feasible solution for 11.18. Conversely, if you can find a feasible solution for 
11.18, then letting xj and x$ both equal zero, you have obtained a feasible solution to 11.19. Since 
all variables are nonnegative, xj and x$ both equalling zero is equivalent to saying the minimum of 
z = xj + x$ subject to the constraints represented by the above augmented matrix equals zero. This 
has proved the following simple observation. 

Observation 11.4.2 There exists a feasible solution to the constraints represented by the augmented 
matrix of 11.18 and x > if and only if the minimum of xj + x$ subject to the constraints of 11.19 
and x > exists and equals 0. 

Of course a similar observation would hold in other similar situations. Now the point of all this 
is that it is trivial to see a feasible solution to 11.19, namely xq = 7, xj = 3, x$ = 2 and all the other 
variables may be set to equal zero. Therefore, it is easy to find an initial simplex tableau for the 
minimization problem just described. First add the column and row for z 
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Next it is necessary to make the last two columns on the bottom left row into simple columns. 
Performing the row operation, this yields an initial simplex tableau, 
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Now the algorithm involves getting rid of the positive entries on the left bottom row. Begin with 
the first column. The pivot is the 2. An application of the simplex algorithm yields the new tableau 
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Now go to the third column. The pivot is the 3/2 in the second row. An application of the simplex 
algorithm yields 
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(11.20) 
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and you see there are only nonpositive numbers on the bottom left column so the process stops and 
yields for the minimum of z — xj -\-x$. As for the other variables, x\ = 5/3, x^ = 0, xs = 1/3, x^ = 
0,^5 = 0,^6 = 5. Now as explained in the above observation, this is a basic feasible solution for the 
original system 11.18. 

Now consider a maximization problem associated with the above constraints. 

Example 11.4.3 Maximize x\ —X2 + 2x3 subject to the constraints, 2x\+X2— xs > 3,#i+:r2+^3 > 
2, x\ + X2 + xs < 7 and x > 0. 

From 11.20 you can immediately assemble an initial simplex tableau. You begin with the first 6 
columns and top 3 rows in 11.20. Then add in the column and row for z. This yields 
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and you first do row operations to make the first and third columns simple columns. Thus the next 
simplex tableau is 
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You are trying to get rid of negative entries in the bottom left row. There is only one, the —5/3. 
The pivot is the 1. The next simplex tableau is then 
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and so the maximum value of z is 32/3 and it occurs when x\ = 10/3, x^ = and x^ = 11/3. 

11.5 Duality 

You can solve minimization problems by solving maximization problems. You can also go the other 
direction and solve maximization problems by minimization problems. Sometimes this makes things 
much easier. To be more specific, the two problems to be considered are 

A.) Minimize z = ex subject to x > and Ax > b and 

B.) Maximize w = yb such that y > and yA < c, 

(equivalently A T y T > c T and w = b T y T ) . 

In these problems it is assumed A is an m x p matrix. 

I will show how a solution of the first yields a solution of the second and then show how a solution 
of the second yields a solution of the first. The problems, A.) and B.) are called dual problems. 

Lemma 11.5.1 Letx be a solution of the inequalities of A.) and lety be a solution of the inequalities 
ofB.). Then 

ex > yb. 

and if equality holds in the above, then x is the solution to A.) and y is a solution to B.). 
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Proof: This follows immediately. Since c > yA, 

ex > yAx > yb. 

It follows from this lemma that if y satisfies the inequalities of B.) and x satisfies the inequalities 
of A.) then if equality holds in the above lemma, it must be that x is a solution of A.) and y is a 
solution of B.). ■ 

Now recall that to solve either of these problems using the simplex method, you first add in slack 
variables. Denote by x' and y' the enlarged list of variables. Thus x' has at least m entries and so 
does y' and the inequalities involving A were replaced by equalities whose augmented matrices were 



of the form 

( A -I b ) , and ( A T I c T 

Then you included the row and column for z and w to obtain 



A 



-I b 

1 



A T I c T 
-b T 1 



and | ; T , 1 ; \ ) . (H.21) 

Then the problems have basic feasible solutions if it is possible to permute the first p + m columns 
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in the above two matrices and obtain matrices of the form 

-c B -c F 1 J \ -b B± -b 2 Fi 1 J K J 

where B, B\ are invert ible m x m and p x p matrices and denoting the variables associated with 
these columns by x^Yb and those variables associated with F or F\ by x F and yp, it follows that 
letting Bx# = b and xj? = 0, the resulting vector x' is a solution to x' > and ( A —I ) x' = b 
with similar constraints holding for y / '. In other words, it is possible to obtain simplex tableaus, 



I B~ X F B~ x h \ ( I B^ 1 F 1 Bf 1 c T 

c s B- 1 F-c F 1 cbS-^J'^O b^B^F-b^ 1 b^BfV 



i T | (11.23) 



Similar considerations apply to the second problem. Thus as just described, a basic feasible solution 
is one which determines a simplex tableau like the above in which you get a feasible solution by 
setting all but the first m variables equal to zero. The simplex algorithm takes you from one basic 
feasible solution to another till eventually, if there is no degeneracy, you obtain a basic feasible 
solution which yields the solution of the problem of interest. 

Theorem 11.5.2 Suppose there exists a solution, x to A.) where x is a basic feasible solution of 
the inequalities of A.). Then there exists a solution, y to B.) and ex = by. It is also possible to 
find y from x using a simple formula. 

Proof: Since the solution to A.) is basic and feasible, there exists a simplex tableau like 11.23 
such that x' can be split into x# and yip such that yip = and x# = B _1 b. Now since it is 
a minimizer, it follows cbB~ 1 F — Cp < and the minimum value for ex is c#B -1 b. Stating 
this again, ex = c#B -1 b. Is it possible you can take y = c B B~ 1 ? From Lemma 11.5.1 this will be 
so if c#B _1 solves the constraints of problem B.). Is c#B -1 > 0? Is cbB~ 1 A < c? These two 
conditions are satisfied if and only if c#B -1 ( A —I ) < ( c ) . Referring to the process of 
permuting the columns of the first augmented matrix of 11.21 to get 11.22 and doing the same 
permutations on the columns of ( A —I ) and ( c ) , the desired inequality holds if and only 
if c#B -1 (B i^)<(c# cp ) which is equivalent to saying ( c# c#B _1 F ) < ( eg Cp ) 
and this is true because c#B _1 F — cp < due to the assumption that x is a minimizer. The simple 
formula is just 

y = c B B- 1 . ■ 

The proof of the following corollary is similar. 

Corollary 11.5.3 Suppose there exists a solution, y to B.) where y is a basic feasible solution of 
the inequalities of B.). Then there exists a solution, x to A.) and ex = by. It is also possible to 
find x from y using a simple formula. In this case, and referring to 11.23, the simple formula is 
x = Bf T b Bl . 

As an example, consider the pig farmers problem. The main difficulty in this problem was finding 
an initial simplex tableau. Now consider the following example and marvel at how all the difficulties 
disappear. 

Example 11.5.4 minimize C = 2x\ + 3^2 + 2x% + 3^4 subject to the constraints 

x\ + 2x2 + xs + 3x4 > 5, 

5xi + 3x2 + 2x3 + ^4 > 8? 

Xi + 2X2 + 2x3 + ^4 > 6, 

2xi + x 2 + x 3 + x 4 > 7, 

xi + x 2 + x 3 + x 4 > 4. 

where each X* > 0. 
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Here the dual problem is to maximize w = byi + 83/2 + %3 



- 7^/4 + 4^/5 subject to the constraints 
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Adding in slack variables, these inequalities are equivalent to the system of equations whose 
augmented matrix is 
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Now the obvious solution is feasible so there is no hunting for an initial obvious feasible solution 
required. Now add in the row and column for w. This yields 
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It is a maximization problem so you want to eliminate the negatives in the bottom left row. Pick 
the column having the one which is most negative, the —8. The pivot is the top 5. Then apply the 
simplex algorithm to obtain 
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There are still negative entries in the bottom left row. 
which has the — ^?. The pivot is the |. This yields 



Do the simplex algorithm to the column 
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and there are still negative numbers. Pick the column which has the 
the top. This yields 



-13/4. The pivot is the 3/8 in 
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which has only one negative entry on the bottom left. The pivot for this first column is the 
next tableau is 
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and all the entries in the left bottom row are nonnegative so the answer is 64/7. This is the same 
as obtained before. So what values for x are needed? Here the basic variables are 2/1,2/3,2/4,2/7. 
Consider the original augmented matrix, one step before the simplex tableau. 
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Permute the columns to put the columns associated with these basic 
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The matrix B is 
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and so B T equals 



i -\ 



v7 



_2 5 

7 7 

1 

5 _2 _6 

7 7 



* \ 



u 



Also b|J = ( 5 6 7 0) and so from Corollary 11.5.3, 





\1 



_2 

7 



5 

Ji 

7 



5 

7 



_2 

_1 

7 



\ \ 



■u 



/5\ 
6 

7 



/¥\ 





11 



VI J 



which agrees with the original way of doing the problem. 

Two good books which give more discussion of linear programming are Strang [15] and Nobel 
and Daniels [12]. Also listed in these books are other references which may prove useful if you are 
interested in seeing more on these topics. There is a great deal more which can be said about linear 
programming. 
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11.6 Exercises 

1. Maximize and minimize z = x\ — 2x2 + ^3 subject to the constraints x\ + X2 + x% < 10, x\ + 
X2 + ^3 > 2, and x x + 2^2 + ^3 < 7 if possible. All variables are nonnegative. 

2. Maximize and minimize the following if possible. All variables are nonnegative. 

(a) z — x\ — 2x2 subject to the constraints x\ + X2 + xs < 10, x\ + X2 + X3 > 1, and 
x x H- 2x 2 + x 3 < 7 

(b) z = x\ — 2x2 — 3x3 subject to the constraints x\ + X2 + X3 < 8, xi + X2 + 3x3 > 1, and 
xi + x 2 + x 3 < 7 

(c) z = 2x\ + X2 subject to the constraints xi — X2 + X3 < 10, x\ + X2 + X3 > 1, and 
xi + 2x 2 + x 3 < 7 

(d) z — x\ + 2x2 subject to the constraints xi — X2 + X3 < 10, x\ + X2 + X3 > 1, and 
xi H- 2x 2 + x 3 < 7 

3. Consider contradictory constraints, x\ + X2 > 12 and Xi + 2x2 < 5, Xi > 0, X2 > 0. You know 
these two contradict but show they contradict using the simplex algorithm. 

4. Find a solution to the following inequalities for x, y > if it is possible to do so. If it is not 
possible, prove it is not possible. 

M 6x + 3y > 4 
W 8x + Ay < 5 

6x1 +4x 3 < 11 

(b) 5xi + 4x 2 + 4x 3 > 8 
6x1 + 6x2 + 5X3 < 11 

6x1 +4x 3 < 11 

(c) 5xi + 4x 2 + 4x 3 > 9 
6x1 + 6x2 + 5x3 < 9 

xi - x 2 + x 3 < 2 

(d) xi + 2x 2 > 4 
3xi + 2x 3 < 7 

5xi — 2x2 + 4x3 < 1 

(e) 6x1 — 3x2 + 5x3 > 2 
5xi — 2x2 + 4x3 < 5 

5. Minimize z = x\ + X2 subject to x\ + X2 > 2, x\ + 3x2 < 20, x\ + X2 < 18. Change to a 
maximization problem and solve as follows: Let yi = M — X{. Formulate in terms of 2/1,2/2- 
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Spectral Theory 

12.1 Eigenvalues And Eigenvectors Of A Matrix 

Spectral Theory refers to the study of eigenvalues and eigenvectors of a matrix. It is of fundamental 
importance in many areas. Row operations will no longer be such a useful tool in this subject. 

12.1.1 Definition Of Eigenvectors And Eigenvalues 

In this section, F = C. 

To illustrate the idea behind what will be discussed, consider the following example. 



Example 12.1.1 Here is a matrix. 



Multiply this matrix by the vector 



5 -10 
22 16 
-9 -2 



-5 

-4 

3 



and see what happens. Then multiply it by 

' 1 



and see what happens. Does this matrix act this way for some other vector? 
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First 



Next 

' 5 -10 

22 16 

-9 -2 

When you multiply the first vector by the given matrix, it stretched the vector, multiplying it by 
10. When you multiplied the matrix by the second vector it sent it to the zero vector. Now consider 

5 -10 
22 16 
-9 -2 

In this case, multiplication by the matrix did not result in merely multiplying the vector by a number. 

In the above example, the first two vectors were called eigenvectors and the numbers, 10 and 

are called eigenvalues. Not every number is an eigenvalue and not every vector is an eigenvector. 
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When you have a nonzero vector which, when multiplied by a matrix results in another vector 
which is parallel to the first or equal to 0, this vector is called an eigenvector of the matrix. This is 
the meaning when the vectors are in R n . Things are less apparent geometrically when the vectors 
are in C n . The precise definition in all cases follows. 

Definition 12.1.2 Let M be an n x n matrix and let x E C n be a nonzero vector for which 

Mx = Ax (12.1) 

for some scalar A. Then x is called an eigenvector and A is called an eigenvalue (characteristic 
value) of the matrix M. 



Note: Eigenvectors are never equal to zero ! 



The set of all eigenvalues of an n x n matrix M, is denoted by a (M) and is referred to as the 
spectrum of M. 

The eigenvectors of a matrix M are those vectors, x for which multiplication by M results in a 
vector in the same direction or opposite direction to x. Since the zero vector has no direction this 
would make no sense for the zero vector. As noted above, is never allowed to be an eigenvector. 
How can eigenvectors be identified? Suppose x satisfies 12.1. Then 

(M - XI) x = 

for some x/0. (Equivalently, you could write (XI — M) x = 0.) Sometimes we will use 

(XI - M) x = 

and sometimes (M — XI) x = 0. It makes absolutely no difference and you should use whichever you 
like better. Therefore, the matrix M — XI cannot have an inverse because if it did, the equation 
could be solved, 

x = (am - xiy 1 (m - xi)) x = (m - xiy 1 ((m - xi) x) = (m - xiy 1 = 0, 

and this would require x = 0, contrary to the requirement that x/0. By Theorem 6.2.1 on Page 
134, 

det(M-AJ) =0. (12.2) 

(Equivalently you could write det (XI — M) = 0.) The expression, det (XI — M) or equivalently, 
det (M — XI) is a polynomial called the characteristic polynomial and the above equation is 
called the characteristic equation. For M an n x n matrix, it follows from the theorem on expanding 
a matrix by its cofactor that det (M — XI) is a polynomial of degree n. As such, the equation 12.2 
has a solution, A G C by the fundamental theorem of algebra. Is it actually an eigenvalue? The 
answer is yes, and this follows from Observation 9.2.7 on Page 219 along with Theorem 6.2.1 on 
Page 134. Since det (M — XI) = the matrix det (M — XI) cannot be one to one and so there exists 
a nonzero vector x such that (M — XI) x = 0. This proves the following corollary. 

Corollary 12.1.3 Let M be an n x n matrix and det (M — XI) = 0. Then there exists a nonzero 
vector x e C n such that (M - XI) x = 0. 

12.1.2 Finding Eigenvectors And Eigenvalues 

As an example, consider the following. 
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5 


-10 


i 


2 


14 


2 


-4 


-8 


6 



Example 12.1.4 Find the eigenvalues and eigenvectors for the matrix 

A 

You first need to identify the eigenvalues. Recall this requires the solution of the equation 

det (A - XI) = 0. 
In this case this equation is 

det I I 2 14 2 I -A I 1 I I =0 



5 


-10 


"M 


/ 


Moo 


2 


14 


2 


-M 


1 


-4 


-8 


6 J 


\ 


,001 



When you expand this determinant and simplify, you find the equation you need to solve is 

(A - 5) (A 2 - 20A + 100) = 

and so the eigenvalues are 

5,10,10. 

We have listed 10 twice because it is a zero of multiplicity two due to 

A 2 - 20A + 100 = (A - 10) 2 . 

Having found the eigenvalues, it only remains to find the eigenvectors. First find the eigenvectors 
for A = 5. As explained above, this requires you to solve the equation, 






That is you need to find the solution to 



-10 



-8 





By now this is an old problem. You set up the augmented matrix and row reduce to get the solution. 
Thus the matrix you must row reduce is 



-10 



(12.3) 



The row reduced echelon form is 

lo-: 
o i \ 


and so the solution is any vector of the form 



t J 
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where t £ F. You would obtain the same collection of vectors if you replaced t with At. Thus a 
simpler description for the solutions to this system of equations whose augmented matrix is in 12.3 
is 

t -2 (12.4) 

where t £ F. Now you need to remember that you can't take t — because this would result in the 
zero vector and 



Eigenvectors are never equal to zero ! 



Other than this value, every other choice of z in 12.4 results in an eigenvector. It is a good idea to 
check your work! To do so, we will take the original matrix and multiply by this vector and see if 
we get 5 times this vector. 



5 


-10 


-5 


2 


14 


2 


-4 


-8 


6 




so it appears this is correct. Always check your work on these problems if you care about getting 
the answer right. 

The parameter, t is sometimes called a free variable. The set of vectors in 12.4 is called the 
eigenspace and it equals ker (A — XI) . You should observe that in this case the eigenspace has 
dimension 1 because the eigenspace is the span of a single vector. In general, you obtain the solution 
from the row echelon form and the number of different free variables gives you the dimension of the 
eigenspace. Just remember that not every vector in the eigenspace is an eigenvector. The vector 
is not an eigenvector although it is in the eigenspace because 



Eigenvectors are never equal to zero ! 
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Next consider the eigenvectors for A = 10. These vectors are solutions to the equation, 

-5 



-10 
14 2 

-8 6 



10 



That is you must find the solutions to 



-5 -10 -5 
2 4 2 



1 
1 
1 



x 

y 



-4 -8 -4 
which reduces to consideration of the augmented matrix 

-5 -10 -5 | 

2 4 2 1 



x 

y 
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The row reduced echelon form for this matrix is 

12 10 







and so the eigenvectors are of the form 




= s 




You can't pick t and s both equal to zero because this would result in the zero vector and 



Eigenvectors are never equal to zero ! 



However, every other choice of t and s does result in an eigenvector for the eigenvalue A = 10. As in 
the case for A = 5 you should check your work if you care about getting it right. 





so it worked. The other vector will also work. Check it. 

12.1.3 A Warning 

The above example shows how to find eigenvectors and eigenvalues algebraically. You may have 
noticed it is a bit long. Sometimes students try to first row reduce the matrix before looking 



for eigenvalues. This is a terrible idea because row operations destroy the eigenvalues. The 
eigenvalue problem is really not about row operations. 

The general eigenvalue problem is the hardest problem in algebra and people still do research 
on ways to find eigenvalues and their eigenvectors. If you are doing anything which would yield a 
way to find eigenvalues and eigenvectors for general matrices without too much trouble, the thing 
you are doing will certainly be wrong. The problems you will see in this book are not too hard 
because they are cooked up to be easy. General methods to compute eigenvalues and eigenvectors 
numerically are presented later. These methods work even when the problem is not cooked up to 
be easy. 

If you are so fortunate as to find the eigenvalues as in the above example, then finding the 
eigenvectors does reduce to row operations and this part of the problem is easy. However, finding 
the eigenvalues along with the eigenvectors is anything but easy because for an n x n matrix, it 
involves solving a polynomial equation of degree n. If you only find a good approximation to the 
eigenvalue, it won't work. It either is or is not an eigenvalue and if it is not, the only solution to the 
equation, (M — XI) x = will be the zero solution as explained above and 



Eigenvectors are never equal to zero ! 



Here is another example. 
Example 12.1.5 Let 
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First find the eigenvalues, 
det 



2 2-2 
1 3 -1 
-1 1 1 



1 
1 
1 



This reduces to A — 6A + 8A = and the solutions are 0, 2, and 4. 



Can be an Eigen value ! 



Now find the eigenvectors. For A = the augmented matrix for finding the solutions is 

2 2 -2 | " 
1 3 —1 | 

-1 1 1 I 



and the row reduced echelon form is 



10-10 
10 




Therefore, the eigenvectors are of the form 

Z 1 

t 

V 1 

where t ^ 0. 

Next find the eigenvectors for A = 2. The augmented matrix for the system of equations needed 
to find these eigenvectors is 

2 -2 | 

1 1 —1 | 
-11-110 



and the row reduced echelon form is 



and so the eigenvectors are of the form 



10 
1-10 






L I 1 



where t ^ 0. 

Finally find the eigenvectors for A = 4. The augmented matrix for the system of equations needed 
to find these eigenvectors is 

-2 2 -2 | 

1 -1 -1 | 

-1 1 -3 10 



and the row reduced echelon form is 



1-10 
10 
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Therefore, the eigenvectors are of the form 

■(: 

where t ^ 0. 

12.1.4 Triangular Matrices 

Although it is usually hard to solve the eigenvalue problem, there is a kind of matrix for which this 
is not the case. These are the upper or lower triangular matrices. I will illustrate by a examples. 



Example 12.1.6 Let A 



1 2 4 

4 7). Find its eigenvalues. 

6 



You need to solve 



1 2 4 

dct ||047 

6 



1 
1 
1 



1- A 2 4 

dot f 4 - A 7 
6-A 



(1-A)(4-A)(6-A). 
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Thus the eigenvalues are just the diagonal entries of the original matrix. You can see it would work 
this way with any such matrix. These matrices are called upper triangular. Stated precisely, a 
matrix A is upper triangular if Aij = for all i > j. Similarly, it is easy to find the eigenvalues for 
a lower triangular matrix, on which has all zeros above the main diagonal. 

12.1.5 Defective And Nondefective Matrices 

Definition 12.1.7 By the fundamental theorem of algebra, it is possible to write the characteristic 
equation in the form 

(A-A 1 ) ri (A-A 2 p---(A-A m ) r ™=0 

where Ti is some integer no smaller than 1. Thus the eigenvalues are Ai, A2, • • • , A m . The algebraic 
multiplicity of Xj is defined to be rj . 

Example 12.1.8 Consider the matrix 

A= I 1 1 I (12.5) 
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What is the algebraic multiplicity of the eigenvalue A = 1? 
In this case the characteristic equation is 



or equivalently, 



det (A - XI) = (1 - A) 3 = 



det (XI -A) = (X- l) 3 = 0. 



Therefore, A is of algebraic multiplicity 3. 

Definition 12.1.9 The geometric multiplicity of an eigenvalue is the dimension of the eigenspace, 

ker (A - XI) . 
Example 12.1.10 Find the geometric multiplicity of X = 1 for the matrix in 12.5. 
We need to solve 





The augmented matrix which must be row reduced to get this solution is therefore, 




This requires z — y = and x is arbitrary. Thus the eigenspace is 

1 

1 , t e ¥. 

It follows the geometric multiplicity of A = 1 is 1. 

Definition 12.1.11 An nxn matrix is called defective if the geometric multiplicity is not equal to 
the algebraic multiplicity for some eigenvalue. Sometimes such an eigenvalue for which the geometric 
multiplicity is not equal to the algebraic multiplicity is called a defective eigenvalue. If the geometric 
multiplicity for an eigenvalue equals the algebraic multiplicity, the eigenvalue is sometimes referred 
to as nondefective. 

Here is another more interesting example of a defective matrix. 

Example 12.1.12 Let 




Find the eigenvectors and eigenvalues. 

In this case the eigenvalues are 3,6,6 where we have listed 6 twice because it is a zero of algebraic 
multiplicity two, the characteristic equation being 

(A - 3) (A - 6) 2 = 0. 

It remains to find the eigenvectors for these eigenvalues. First consider the eigenvectors for A = 3. 
You must solve 
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The augmented matrix is 



and the row reduced echelon form is 





-1 
1 




so the eigenvectors are nonzero vectors of the form 

-t \=t 
t J 

Next consider the eigenvectors for A = 6. This requires you to solve 




and the augmented matrix for this system of equations is 




The row reduced echelon form is 

/ 1 | 

oiio 
V o o o o 

and so the eigenvectors for A = 6 are of the form 




or written more simply, 



where t G F. 

Note that in this example the eigenspace for the eigenvalue A = 6 is of dimension 1 because there 
is only one parameter. However, this eigenvalue is of multiplicity two as a root to the characteristic 
equation. Thus this eigenvalue is a defective eigenvalue. However, the eigenvalue 3 is nondefective. 
The matrix is defective because it has a defective eigenvalue. 

The word, defective, seems to suggest there is something wrong with the matrix. This is in fact 
the case. Defective matrices are a lot of trouble in applications and we may wish they never occurred. 
However, they do occur as the above example shows. When you study linear systems of differential 
equations, you will have to deal with the case of defective matrices and you will see how awful they 
are. The reason these matrices are so horrible to work with is that it is impossible to obtain a basis 
of eigenvectors. When you study differential equations, solutions to first order systems are expressed 
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in terms of eigenvectors of a certain matrix times e xt where A is an eigenvalue. In order to obtain a 
general solution of this sort, you must have a basis of eigenvectors. For a defective matrix, such a basis 
does not exist and so you have to go to something called generalized eigenvectors. Unfortunately, 
it is never explained in beginning differential equations courses why there are enough generalized 
eigenvectors and eigenvectors to represent the general solution. In fact, this reduces to a difficult 
question in linear algebra equivalent to the existence of something called the Jordan Canonical form 
which is much more difficult than everything discussed in the entire differential equations course. If 
you become interested in this, see Appendix A. 

Ultimately, the algebraic issues which will occur in differential equations are a red herring anyway. 
The real issues relative to existence of solutions to systems of ordinary differential equations are 
analytical, having much more to do with calculus than with linear algebra although this will likely 
not be made clear when you take a beginning differential equations class. 

In terms of algebra, this lack of a basis of eigenvectors says that it is impossible to obtain a 
diagonal matrix which is similar to the given matrix. 

Although there may be repeated roots to the characteristic equation, 12.2 and it is not known 
whether the matrix is defective in this case, there is an important theorem which holds when con- 



sidering eigenvectors which correspond to distinct eigenvalues. 
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Theorem 12.1.13 Suppose Mvi — A^v^z = 1, • • • , r ; v$ ^ 0, and that if i ^ j, then Xi ^ Xj. 
Then the set of eigenvectors, {vi, • • • , v r } is linearly independent. 

Proof. Suppose the claim of the lemma is not true. Then there exists a subset of this set of 
vectors 

{wi,--- ,w r } C {vi,--- ,v fe } 

such that 

r 

^c jWj =0 (12.6) 

where each c 3 \ ^ 0. Say Mwj = fijWj where 

{/i l5 • • • ,/i r } C {Ai, • • • , A/e} , 

the jjij being distinct eigenvalues of M. Out of all such subsets, let this one be such that r is as 
small as possible. Then necessarily, r > 1 because otherwise, qwi = which would imply wi = 0, 
which is not allowed for eigenvectors. 
Now apply M to both sides of 12.6. 

r 

5> jW =0. (12.7) 

Next pick n k ^ and multiply both sides of 12.6 by ji k . Such a n k exists because r > 1. Thus 

r 

^ Ci /x fcWi =0 (12.8) 

Subtract the sum in 12.8 from the sum in 12.7 to obtain 

r 

Now one of the constants Cj (/i fe — /i ■) equals 0, when j = /c. Therefore, r was not as small as 
possible after all. ■ 

Here is another proof in case you did not follow the above. 

Theorem 12.1.14 Suppose Mvj = A^v^z = 1, • • • ,r ; v^ ^ 0, and that if i ^ j, then Xi ^ Xj. 
Then the set of eigenvectors, {vi, • • • , v r } is linearly independent. 

Proof: Suppose the conclusion is not true. Then in the matrix 

( vi v 2 ••• v r ) 

not every column is a pivot column. Let the pivot columns be {wi, • • • , w^}, k < r. Then there 

exists v G {vi, • • • , v r } , Mv =A v v, v ^ {wi, • • • , vs k } •> such that 



k 
v 



^W. (12.9) 



Then doing M to both sides yields 



A v v = ^c i A Wi w i (12.10) 
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But also you could multiply both sides of 12.9 by A v to get 

k 

A v v = y^CjAyWj. 

i=l 

And now subtracting this from 12.10 yields 

k 



= ^ Ci (A v -A Wi ; 



W 7 ; 



and by independence of the {wi, • • • , w/e} , this requires C{ (A v — A W J = for each i. Since the 
eigenvalues are distinct, A v — A Wi ^ and so each q = 0. But from 12.9, this requires v = which 
is impossible because v is an eigenvector and 



Eigenvectors are never equal to zero ! 



12.1.6 Diagonalization 

First of all, here is what it means for two matrices to be similar. 

Definition 12.1.15 Let A,B be two n x n matrices. Then they are similar if and only if there 
exists an invertible matrix S such that 

A = S~ 1 BS 

Proposition 12.1.16 Define for n x n matrices A ^ B if A is similar to B. Then 

A~A, 

IfA~B then B ~ A 
IfA~BandB~C then A~C 

Proof: It is clear that A ~ A because you could just take S = I. If A ~ B, then for some S 
invertible, 

A = S~ 1 BS 

and so 

SAS' 1 = B 

But then 

(S- 1 )' 1 AS- 1 = B 

which shows that B ~ A. 

Now suppose A ~ B and B ~ C. Then there exist invertible matrices 5, T such that 

A = S^BS, B = T~ X CT. 

Therefore, 

A = S^T^CTS = (TSy 1 C (TS) 

showing that A is similar to C. ■ 

For your information, when ~ satisfies the above conditions, it is called a similarity relation. 
Similarity relations are very significant in mathematics. 

When a matrix is similar to a diagonal matrix, the matrix is said to be diagonalizable. I think 
this is one of the worst monstrosities for a word that I have ever seen. Nevertheless, it is commonly 
used in linear algebra. It turns out to be the same as nondefective which will follow easily from later 
material. The following is the precise definition. 
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Definition 12.1.17 Let A be an nxn matrix. Then A is diagonalizable if there exists an invertible 
matrix S such that 

S^AS = D 

where D is a diagonal matrix. This means D has a zero as every entry except for the main diagonal. 
More precisely, Dij = unless i = j . Such matrices look like the following. 

* 

x * 

where * might not be zero. 

The most important theorem about diagonalizability 1 is the following major result. 

Theorem 12.1.18 An nxn matrix is diagonalizable if and only if¥ n has a basis of eigenvectors 
of A. Furthermore, you can take the matrix S described above, to be given as 

S = ( vi v 2 • • • v n ) 

where here the Vk are the eigenvectors in the basis for ¥ n . If A is diagonalizable, the eigenvalues of 
A are the diagonal entries of the diagonal matrix. 

Proof: Suppose there exists a basis of eigenvectors {vk} where Av^ = A^Vfe. Then let S be 
given as above. It follows S _1 exists and is of the form 



s- 1 



T 



\< J 



where wiv,- = 5t,-. Then 



k v i ~°kj- 



Ai 




\ 



/wf \ 



Wo 






( Aivi A 2 v 2 ••• AnV,, 



( Av! Av 2 ••• Av n 



\< ) 

= S^AS 

Next suppose A is diagonalizable so that S~ 1 AS = D. Let S = ( Vi v 2 
columns are the v^ and 

/ Ax 

D = 



v n ) where the 



V o 



A. 



Then 



AS = SD = ( vi v 2 



/A X 



V o 



1 This word has 9 syllables 
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and so 



( Avi Av 2 • • • Av n ) = ( Aivi A 2 v 2 • • • A n v n 



showing the v^ are eigenvectors of A and the A& are eigenvectors. Now the v^ form a basis for F n 
because the matrix S having these vectors as columns is given to be invert ible. ■ 



Example 12.1.19 Let A 

onal matrix. 



2 

1 4 — 1 | . Find a matrix, S such that S~ 1 AS = D : a di 
-2-4 4 



Solving det (XI — A) = yields the eigenvalues are 2 and 6 with 2 an eigenvalue of multiplicity 
two. Solving (21 — A) x = to find the eigenvectors, you find that the eigenvectors are 



■ b | 




where a, b are scalars. An eigenvector for A = 6 is 1 



Let the matrix S be 



S 



-2 1 
1 1 
1 -2 
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That is, the columns are the eigenvectors. Then 



S^AS 




We know the result from the above theorem, but it is nice to see it work in a specific example just 
the same. You may wonder if there is a need to find /S _1 . The following is an example of a situation 
where this is needed. It is one of the major applications of diagonalizability. 



Example 12.1.20 Here is a matrix. A ■ 



2 


1 








1 





-1 


-1 


1 



Find A 50 . 
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Sometimes this sort of problem can be made easy by using diagonalization. In this case there 
are eigenvectors, 

0'(T)-(? 

the first two corresponding to A = 1 and the last corresponding to A = 2. Then let the eigenvectors 
be the columns of the matrix, S. Thus 



Then also 

and 

S^AS 





Now it follows 

/0-1-lWlOOWl 1 1 

A = SDS' 1 =01 00100 10 

\ 1 l/\0 2/\-l-10 

Note that (SDS' 1 ) 2 = SDS^SDS' 1 = SD 2 S~ 1 and 

(SDS' 1 ) 3 = SDS^SDS^SDS' 1 = SD 3 S~\ 

etc. In general, you can see that 

(SDS' 1 ) 71 = SD n S~ 1 

In other words, A n = SD n S~ 1 . Therefore, 

A 50 = SD^S- 1 

-1 -1 \ / 1 
1 10 

10 1 / \ 2 

It is easy to raise a diagonal matrix to a power. 

50 



It follows 



i50 
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That isn't too hard. However, this would have been horrendous if you had tried to multiply A 50 by 
hand. 

This technique of diagonalization is also important in solving the differential equations resulting 
from vibrations. Sometimes you have systems of differential equation and when you diagonalize an 
appropriate matrix, you "decouple" the equations. This is very nice. It makes hard problems trivial. 

The above example is entirely typical. If A = SDS -1 then A 771 = SD m S~ 1 and it is easy to 
compute D m . More generally, you can define functions of the matrix using power series in this way. 

12.1.7 The Matrix Exponential 

When A is diagonalizable, one can easily define what is meant by e A . Here is how. You know 

S^AS = D 
where D is a diagonal matrix. You also know that if D is of the form 

Ai \ 



then 

D m = 



K J 

o 

\m 



and that 

A m = SD m S -l 

as shown above. Recall why this was. 

A = SDS' 1 

and so 



A m = SDS^SDS^SDS- 1 ■ ■ ■ SDS' 1 
Now formally write the following power series for e A 



e A = 


00 j^k °° 
k=0 ' fc=0 


k=0 


If D is given above in 12.11, the above sum is 


of the form 


/ x \ k 
oo / k\ A l 

k =°\ 


1 \ k 1 


/ 2^k=0 fe! A l 


° ^ 


= S 




P" 1 


\ o 


v^oo 1 \k I 

2^k=0 k\ A n / 


/ e A > 


\ 


= S \ 


W 


\ o 


e A„ j 


and this last thing is the definite 


Dn of what is ] 


neant by e A . 
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Example 12.1.21 Let 

/ 2 -1-1 
A= 12 1 
\ -1 1 2 

Find e A . 

The eigenvalues happen to be 1,2,3 and eigenvectors associated with these eigenvalues are 

-1 ^2, -1 o 1, |^3 



Then let 



and so 



-1 -1 
S= [ -1 -1 
1 1 1 

-1 -1 -1 

s- 1 = | 1 1 
1 1 



Question: 
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and 



Then the matrix exponential is 



D 





e 2 e 2 — e 3 e 2 — e 3 

2 2 2 

e z - e e z e z — e 

-e 2 + e -e 2 + e 3 -e 2 + e + e 3 

Isn't that nice? You could also talk about sin (A) or cos (A) etc. You would just have to use a 
different power series. 

This matrix exponential is actually a useful idea when solving autonomous systems of first order 
linear differential equations. These are equations which are of the form 

x' = Ax 
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where x is a vector in R n or C n and A is an n x n matrix. Then it turns out that the solution to 
the above system of equations is x (t) = e At c where c is a constant vector. 

12.1.8 Complex Eigenvalues 

Sometimes you have to consider eigenvalues which are complex numbers. This occurs in differential 
equations for example. You do these problems exactly the same way as you do the ones in which 
the eigenvalues are real. Here is an example. 

Example 12.1.22 Find the eigenvalues and eigenvectors of the matrix 


A= I 2 -1 




2 



You need to find the eigenvalues. Solve 



det 




This reduces to (A - 1) (A 2 - 4A + 5) =0. The solutions are A = 1, A = 2 + i, A = 2 - i. 

There is nothing new about finding the eigenvectors for A = 1 so consider the eigenvalue A = 2 + i. 
You need to solve 





In other words, you must consider the augmented matrix 




for the solution. Divide the top row by (1 + i) and then take —i times the second row and add to 
the bottom. This yields 

10 

i 1 



Now multiply the second row by — i to obtain 




Therefore, the eigenvectors are of the form 




You should find the eigenvectors for A = 2 — i. These are 
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As usual, if you want to get it right you had better check it. 








so it worked. 








(2-0 



12.2 Some Applications Of Eigenvalues And Eigenvectors 



12.2.1 Principle Directions 

Recall that n x n matrices can be considered as linear transformations. If F is a 3 x 3 real matrix 
having positive determinant, it can be shown that F = RU where R is a rotation matrix and U is a 
symmetric real matrix having positive eigenvalues. An application of this wonderful result, known 
to mathematicians as the right polar factorization, is to continuum mechanics where a chunk of 
material is identified with a set of points in three dimensional space. 

The linear transformation, F in this context is called the deformation gradient and it describes 
the local deformation of the material. Thus it is possible to consider this deformation in terms of 
two processes, one which distorts the material and the other which just rotates it. It is the matrix 
U which is responsible for stretching and compressing. This is why in elasticity, the stress is often 
taken to depend on U which is known in this context as the right Cauchy Green strain tensor. 
In this context, the eigenvalues will always be positive. The symmetry of U allows the proof of a 
theorem which says that if Am is the largest eigenvalue, then in every other direction other than 
the one corresponding to the eigenvector for Am the material is stretched less than Am and if A m 
is the smallest eigenvalue, then in every other direction other than the one corresponding to an 
eigenvector of A m the material is stretched more than A m . This process of writing a matrix as a 
product of two such matrices, one of which preserves distance and the other which distorts is also 
important in applications to geometric measure theory an interesting field of study in mathematics 
and to the study of quadratic forms which occur in many applications such as statistics. Here we 
are emphasizing the application to mechanics in which the eigenvectors of the symmetric matrix U 
determine the principle directions, those directions in which the material is stretched the most 
or the least. 

Example 12.2.1 Find the principle directions determined by the matrix 

/ 29 _6_ _6_ \ 
/ 11 11 11 \ 



29 


6 


6 


11 


11 


11 


6 


41 


19 


11 


44 


44 


6 


19 


41 


11 


44 


44 



The eigenvalues are 3, 1, and 



It is nice to be given the eigenvalues. The largest eigenvalue is 3 which means that in the 
direction determined by the eigenvector associated with 3 the stretch is three times as large. The 
smallest eigenvalue is 1/2 and so in the direction determined by the eigenvector for 1/2 the material 
is stretched by a factor of 1/2, becoming locally half as long. It remains to find these directions. 
First consider the eigenvector for 3. It is necessary to solve 



/ 




/ 



29 
11 


6 
11 


6 
11 


6 
11 


41 
44 


19 

44 


6 
11 


19 
44 


41 
44 



\ 
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Thus the augmented matrix for this system of equations is 

/ A _A __6_ I o \ 

/ 11 11 11 I u \ 



_6_ 91 _19 

' 11 44 44 



_6_ _19 91 

" 11 44 44 



The row reduced echelon form is 

10-30 
1-10 


and so the principle direction for the eigenvalue, 3 in which the material is stretched to the maximum 

extent is 

3 
1 
1 

A direction vector (or unit vector) in this direction is 



3/Vll 

1/vTT 
1/vTT 
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You should show that the direction in which the material is compressed the most is in the direction 




Note this is meaningful information which you would have a hard time finding without the theory 
of eigenvectors and eigenvalues. 

12.2.2 Migration Matrices 

There are applications which are of great importance which feature only one eigenvalue. 

Definition 12.2.2 Let n locations be denoted by the numbers 1,2, • • • , n. Also suppose it is the case 
that each year a^ denotes the proportion of residents in location j which move to location i. Also 
suppose no one escapes or emigrates from without these n locations. This last assumption requires 
s ^ i dij = 1. Such matrices in which the columns are nonnegative numbers which sum to one are 
called Markov matrices. In this context describing migration, they are also called migration 
matrices. 
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Example 12.2.3 Here is an example of one of these matrices. 

A .2 

.6 .8 

Thus if it is considered as a migration matrix, A is the proportion of residents in location 1 which 
stay in location one in a given time period while .6 is the proportion of residents in location 1 which 
move to location 2 and .2 is the proportion of residents in location 2 which move to location 1. 
Considered as a Markov matrix, these numbers are usually identified with probabilities. 

If v = (#i, • • • , x n ) where X{ is the population of location i at a given instant, you obtain the 
population of location i one year later by computing V • UijXj = (^ v )i • Therefore, the population 
of location i after k years is (A fe v) . . An obvious application of this would be to a situation in which 
you rent trailers which can go to various parts of a city and you observe through experiments the 
proportion of trailers which go from point i to point j in a single day. Then you might want to find 
how many trailers would be in all the locations after 8 days. 

Proposition 12.2.4 Let A = (a^) be a migration matrix. Then 1 is always an eigenvalue for A. 

Proof: Remember that det (B T ) = det (B) . Therefore, 

det (A - XI) = det ((A - A/) T ) = det (A T - XI) 

because I T = /. Thus the characteristic equation for A is the same as the characteristic equation 
for A T and so A and A T have the same eigenvalues. We will show that 1 is an eigenvalue for A T 
and then it will follow that 1 is an eigenvalue for A. 

Remember that for a migration matrix, ^2 i a^ = 1. Therefore, if A T = (pij) so b^ = a^, it 
follows that 



Therefore, from matrix multiplication, 



i\ 



i/ 





which shows that 



is an eigenvector for A T corresponding to the eigenvalue, A = 1. As 



1 



explained above, this shows that A = 1 is an eigenvalue for A because A and A T have the same 
eigenvalues. ■ 

(A .1\ 
Example 12.2.5 Consider the migration matrix .2 .8 for locations 1,2, and 3. Suppose 

\ .2 .2 .9 / 

initially there are 100 residents in location 1, 200 in location 2 and 400 in location 4- Find the 
population in the three locations after 10 units of time. 

From the above, it suffices to consider 

io / ,_ x / 115 085 829 22 

120.130 672 44 
464.783498 34 
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Of course you would need to round these numbers off. 

A related problem asks for how many there will be in the various locations after a long time. It 
turns out that if some power of the migration matrix has all positive entries, then there is a limiting 
vector x = lim^oo A fc xo where xo is the initial vector describing the number of inhabitants in the 
various locations initially. This vector will be an eigenvector for the eigenvalue 1 because 

x = lim A fe x = lim A fe+1 x = A lim A fc x = Ax, 

fc— ^oo fc— )-oo /c— )-oo 

and the sum of its entries will equal the sum of the entries of the initial vector x because this sum 
is preserved for every multiplication by A since 



i 3 j \ i / 3 



3 

Here is an example. It is the same example as the one above but here it will involve the long time 
limit. 

/.6 .1\ 

Example 12.2.6 Consider the migration matrix .2 .8 for locations 1,2, and 3. Suppose 

\ .2 .2 .9 / 

initially there are 100 residents in location 1, 200 in location 2 and 400 in location 4- Find the 
population in the three locations after a long time. 

You just need to find the eigenvector which goes with the eigenvalue 1 and then normalize it so 
the sum of its entries equals the sum of the entries of the initial vector. Thus you need to find a 
solution to 

.6 .1 

.2 .8 

.2 .2 .9 




.4 
-.2 .2 
-.2 -.2 






The augmented matrix is 



and its row reduced echelon form is 



Therefore, the eigenvectors are 



and all that remains is to choose the value of s such that 

-s + -s + s = 100 + 200 + 400 
4 4 

This yields s = ^^ and so the long time limit would equal 

116.666 666 666 666 7 
116.666 666 666 666 7 
466.666 666 666 666 7 

You would of course need to round these numbers off. You see that you are not far off after just 10 
units of time. Therefore, you might consider this as a useful procedure because it is probably easier 
to solve a simple system of equations than it is to raise a matrix to a large power. 




Download free eBooks at bookboon.com 

304 



Elementary Linear Algebra 



Spectral Theory 



/ § 



Example 12.2.7 Suppose a migration matrix is 



\ 



11 

20 



\ \ 



3_ 

10 



Find the comparison between 



the populations in the three locations after a long time. 

This amounts to nothing more than finding the eigenvector for A = 1. Solve 



/ 



V 



The augmented matrix is 



/ 




n 

20 



11 
"20 



\ 



_3_ 

10 



J_ 
10 




o 

o 
/ 
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The row echelon form is 



/I 
1 

V o o 



and so an eigenvector is 



It) 





19 




18 





19 






M 

8 . 






Thus there will be i| 



th 



more in location 2 than in location 1. There will be 



19 

18 



th 



more in location 3 



than in location 2. 

You see the eigenvalue problem makes these sorts of determinations fairly simple. 

There are many other things which can be said about these sorts of migration problems. 
They include things like the gambler's ruin problem which asks for the probability that a compulsive 
gambler will eventually lose all his money. However those problems are not so easy although they 
still involve eigenvalues and eigenvectors. 
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There are many other important applications of eigenvalue problems. We have just given a few 
such applications here. As pointed out, this is a very hard problem but sometimes you don't need 
to find the eigenvalues exactly. 

12.3 The Estimation Of Eigenvalues 

There are ways to estimate the eigenvalues for matrices from just looking at the matrix. The 
most famous is known as Gerschgorin's theorem. This theorem gives a rough idea where the 
eigenvalues are just from looking at the matrix. 

Theorem 12.3.1 Let A be an n x n matrix. Consider the n Gerschgorin discs defined as 

Di=l\eC: \X-a u \ <J]kil 

{ j& 

Then every eigenvalue is contained in some Gerschgorin disc. 

This theorem says to add up the absolute values of the entries of the i th row which are off the 
main diagonal and form the disc centered at an having this radius. The union of these discs contains 
a (A) , the spectrum of A. 

Proof: Suppose Ax = Ax where x/0. Then for A = (a^) 

/ a^ijXj — yA an) X{. 
Therefore, picking k such that \xk\ > \%j\ for all x^ it follows that \x^\ 7^ since |x| 7^ and 

= |A -a k k\ \xk\ • 



\ x k\^2\a kj \ > ^2\a k j\\xj\ > 



/2 a kjZ 
j^k 



Now dividing by |xfc|, it follows A is contained in the k th Gerschgorin disc. 

Example 12.3.2 Suppose the matrix is 

/ 21 -16 -6 
A= 14 60 12 

\ 7 8 38 

Estimate the eigenvalues. 

The exact eigenvalues are 35, 56, and 28. The Gerschgorin disks are 

D 1 = {AG C: |A-21| < 22}, 

D 2 = {\e£: |A-60| < 26} , 

and 

D 3 = {AGC: |A-38| < 15}. 

Gerschgorin's theorem says these three disks contain the eigenvalues. Now 35 is in £>3,56 is in D 2 
and 28 is in D\. 

More can be said when the Gerschgorin disks are disjoint but this is an advanced topic which 
requires the theory of functions of a complex variable. If you are interested and have a background 
in complex variable techniques, this is in [11] 
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12.4 Exercises 

1. State the eigenvalue problem from an algebraic perspective. 

2. State the eigenvalue problem from a geometric perspective. 

3. If A is the matrix of a linear transformation which rotates all vectors in R 2 through 30°, explain 
why A cannot have any real eigenvalues. 

4. If A is an n x n matrix and c is a nonzero constant, compare the eigenvalues of A and cA. 

5. If A is an invertible n x n matrix, compare the eigenvalues of A and A~ x . More generally, for 
m an arbitrary integer, compare the eigenvalues of A and A m . 

6. Let A, B be invertible n x n matrices which commute. That is, AB = BA. Suppose x is an 
eigenvector of B. Show that then Ax must also be an eigenvector for B. 

7. Suppose A is an n x n matrix and it satisfies A 771 = A for some m a positive integer larger 
than 1. Show that if A is an eigenvalue of A then |A| equals either or 1. 

8. Show that if ^4x = Ax and Ay = Ay, then whenever a, b are scalar s, 

A (ax + by) = A (ax + by) . 
Does this imply that ax + by is an eigenvector? Explain. 

9. Find the eigenvalues and eigenvectors of the matrix 




Determine whether the matrix is defective. 
10. Find the eigenvalues and eigenvectors of the matrix 




Determine whether the matrix is defective. 
11. Find the eigenvalues and eigenvectors of the matrix 




Determine whether the matrix is defective. 
12. Find the eigenvalues and eigenvectors of the matrix 




Determine whether the matrix is defective. 
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13. Find the eigenvalues and eigenvectors of the matrix 




Determine whether the matrix is defective. 
14. Find the eigenvalues and eigenvectors of the matrix 




Determine whether the matrix is defective. 
15. Find the eigenvalues and eigenvectors of the matrix 




Determine whether the matrix is defective. 

16. Find the eigenvalues and eigenvectors of the matrix 

20 9 -18 
6 5-6 
30 14 -27 

Determine whether the matrix is defective. 

17. Find the eigenvalues and eigenvectors of the matrix 




Determine whether the matrix is defective. 
18. Find the eigenvalues and eigenvectors of the matrix 




Determine whether the matrix is defective. 
19. Find the eigenvalues and eigenvectors of the matrix 
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Determine whether the matrix is defective. 
20. Find the eigenvalues and eigenvectors of the matrix 



2 1 
2 3 
2 2 



Determine whether the matrix is defective. 

21. Find the complex eigenvalues and eigenvectors of the matrix 

4 -2 -2 
2-2 
2 2 

22. Find the eigenvalues and eigenvectors of the matrix 

9 6-3 
6 
-3 -6 9 
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Determine whether the matrix is defective. 

23. Find the complex eigenvalues and eigenvectors of the matrix 
whether the matrix is defective. 

24. Find the complex eigenvalues and eigenvectors of the matrix 
whether the matrix is defective. 

25. Find the complex eigenvalues and eigenvectors of the matrix 
whether the matrix is defective. 

26. Find the complex eigenvalues and eigenvectors of the matrix 
whether the matrix is defective. 




Determine 



Determine 



Determine 



Determine 



27. Let A be a real 3x3 matrix which has a complex eigenvalue of the form a + ib where b ^ 0. 
Could A be defective? Explain. Either give a proof or an example. 

28. Let T be the linear transformation which reflects vectors about the x axis. Find a matrix for 
T and then find its eigenvalues and eigenvectors. 

29. Let Tbe the linear transformation which rotates all vectors in R 2 counterclockwise through 
an angle of ir/2. Find a matrix of T and then find eigenvalues and eigenvectors. 

30. Let A be the 2x2 matrix of the linear transformation which rotates all vectors in R 2 through 
an angle of 6. For which values of does A have a real eigenvalue? 

31. Let T be the linear transformation which reflects all vectors in R 3 through the xy plane. Find 
a matrix for T and then obtain its eigenvalues and eigenvectors. 

32. Find the principle direction for stretching for the matrix 



/ 



V 



13 
9 


&V5 


|v 


h^ 


6 

5 


4 
15 


&V5 


4 
15 


61 

45 



v 7 ^ \ 



The eigenvalues are 2 and 1. 
33. Find the principle directions for the matrix 

/ 1 



V o 





1/ 



34. Suppose the migration matrix for three locations is 
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Find a comparison for the populations in the three locations after a long time. 
35. Suppose the migration matrix for three locations is 




Find a comparison for the populations in the three locations after a long time. 

36. You own a trailer rental company in a large city and you have four locations, one in the South 
East, one in the North East, one in the North West, and one in the South West. Denote these 
locations by SE,NE,NW, and SW respectively. Suppose you observe that in a typical day, .8 
of the trailers starting in SE stay in SE, .1 of the trailers in NE go to SE, .1 of the trailers 
in NW end up in SE, .2 of the trailers in SW end up in SE, .1 of the trailers in SE end up 
in NE,.7 of the trailers in NE end up in NE,.2 of the trailers in NW end up in NE,.l of the 
trailers in SW end up in NE, .1 of the trailers in SE end up in NW, .1 of the trailers in NE 
end up in NW, .6 of the trailers in NW end up in NW, .2 of the trailers in SW end up in NW, 

of the trailers in SE end up in SW, .1 of the trailers in NE end up in SW, .1 of the trailers in 
NW end up in SW, .5 of the trailers in SW end up in SW. You begin with 20 trailers in each 
location. Approximately how many will you have in each location after a long time? Will any 
location ever run out of trailers? 

37. Let A be the n x n, n > 1, matrix of the linear transformation which comes from the projection 
v^proj w (v). Show that A cannot be invertible. Also show that A has an eigenvalue equal to 

1 and that for A an eigenvalue, |A| < 1. 

38. Let v be a unit vector in R n and let A — I — 2vv T . Show that A has an eigenvalue equal to 
-1. 

39. Let M be an n x n matrix and suppose xi, • • • ,x n are n eigenvectors which form a linearly 
independent set. Form the matrix S by making the columns these vectors. Show that S _1 
exists and that S~ 1 MS is a diagonal matrix (one having zeros everywhere except on the 
main diagonal) having the eigenvalues of M on the main diagonal. When this can be done the 
matrix is diagonalizable. This is presented in the text. You should write it down in your 
own words filling in the details without looking at the text. 

40. Show that a matrix M is diagonalizable if and only if it has a basis of eigenvectors. Hint: 
The first part is done in Problem 39. It only remains to show that if the matrix can be 
diagonalized by some matrix S giving D = S~ 1 MS for D a diagonal matrix, then it has a 
basis of eigenvectors. Try using the columns of the matrix S. Like the last problem, you should 
try to do this yourself without consulting the text. These problems are a nice review of the 
meaning of matrix multiplication. 

41. Suppose A is an n x n matrix which is diagonally dominant. This means 

\ a ii\ > ^2\ a ij\ • 
3 

Show that A -1 must exist. 

42. Is it possible for a nonzero matrix to have only as an eigenvalue? 

43. Let M be an n x n matrix. Then define the adjoint of M,denoted by M* to be the transpose 
of the conjugate of M. For example, 

2 i V _ ( 2 1-i 
1 + i 3 / " V -i 3 
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A matrix M, is self adjoint if M* = M. Show the eigenvalues of a self adjoint matrix are all 
real. If the self adjoint matrix has all real entries, it is called symmetric. 

44. Suppose A is an n x n matrix consisting entirely of real entries but a+ib is a complex eigenvalue 
having the eigenvector x + zy. Here x and y are real vectors. Show that then a — ib is also an 
eigenvalue with the eigenvector x — zy. Hint: You should remember that the conjugate of a 
product of complex numbers equals the product of the conjugates. Here a + ib is a complex 
number whose conjugate equals a — ib. 

45. Recall an n x n matrix is said to be symmetric if it has all real entries and if A = A T . Show 
the eigenvectors and eigenvalues of a real symmetric matrix are real. 

46. Recall an n x n matrix is said to be skew symmetric if it has all real entries and if A — —A T . 
Show that any nonzero eigenvalues must be of the form ib where z 2 = —1. In words, the 
eigenvalues are either or pure imaginary. Show also that the eigenvectors corresponding to 
the pure imaginary eigenvalues are imaginary in the sense that every entry is of the form ix 

for x e R. 

47. A discreet dynamical system is of the form 

x(fc + l) = Ax(fc), x(0) =x 
where A is an n x n matrix and x (k) is a vector in R n . Show first that 

x(fc) = A k x 
for all k > 1. If A is nondefective so that it has a basis of eigenvectors, {vi, • • • , v n } where 

Av j = X J V J 
you can write the initial condition xq in a unique way as a linear combination of these eigen- 
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vectors. Thus 



Now explain why 



x o = X^' 



i=i 



which gives a formula for x (fc) , the solution of the dynamical system. 

48. Suppose A is an n x n matrix and let v be an eigenvector such that Av = Av. Also suppose 
the characteristic polynomial of A is 



Explain why 



det (XI - A) = X n + a n _iA n x H h a x A + a 

(A n + a n _! A n - X + . . . + aU + a /) v = 
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If A is nondefective, give a very easy proof of the Cay ley Hamilton theorem based on this. 
Recall this theorem says A satisfies its characteristic equation, 

A n + a n _iA n_1 H h a x A + a I = 0. 

49. Suppose an n x n nondefective matrix A has only 1 and —1 as eigenvalues. Find A 12 . 

50. Suppose the characteristic polynomial of an n x n matrix A is 1 — A n . Find A 77171 where m is 
an integer. Hint: Note first that A is nondefective. Why? 

51. Sometimes sequences come in terms of a recursion formula. An example is the Fibonacci 
sequence. 

Xq — 1 — 3?i, X n _|_x — X n T" X n —\ 

Show this can be considered as a discreet dynamical system as follows. 

x n+ i \ = / 1 1 \ / x n \ / xi \ = / 1 

^n y V ! ) \ X n~l / ' Uo j VI 

Now use the technique of Problem 47 to find a formula for x n . 

52. Let A be an n x n matrix having characteristic polynomial 

det (XI - A) = X n + a n _iA n_1 H h aiA + a 

Show that a = (-l) n det (A). 

53. Find ( \ J ) • Next find 

3 

lim ( 2 i 



??.— >oo \ — ^ U 
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54. Find e where A is the matrix 



2 1 

2 1 ) in the above problem. 
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Matrices And The Inner Product 

13.1 Symmetric And Orthogonal Matrices 

13.1.1 Orthogonal Matrices 

Remember that to find the inverse of a matrix was often a long process. However, it was very easy 
to take the transpose of a matrix. For some matrices, the transpose equals the inverse and when the 
matrix has all real entries, and this is true, it is called an orthogonal matrix. Recall the following 
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definition given earlier. 

Definition 13.1.1 A real n x n matrix U is called an Orthogonal matrix if UU T = U T U = /. 

Example 13.1.2 Show the matrix 

is orthogonal. 



UU 1 



i i 

1 1_ 

V2 V2 



1 1 

1 1_ 

^2 v/2 



1 
1 



Example 13.1.3 Let U 



1 
0-1 
0-10 



Is U orthogonal? 
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The answer is yes. This is because the columns form an ort honor mal set of vectors as well as the 
rows. As discussed above this is equivalent to U T U = /. 





U U 

) \ -1 

When you say that U is orthogonal, you are saying that 

/ , UijU jk = 2_^ UijU kj = s ik . 

3 3 

In words, the dot product of the i th row of U with the k th row gives 1 if i = k and if i ^ k. The 
same is true of the columns because U T U = I also. Therefore, 



^2u^u Jk = ^u Jt u jh 



5i 



i k 



which says that the one column dotted with another column gives 1 if the two columns are the same 
and if the two columns are different. 

More succinctly, this states that if ui , • • • , u n are the columns of U, an orthogonal matrix, then 



Ui • Uj = Sij = 



{s 



if i = j 



(13.1) 



Oifi^j • 

Definition 13.1.4 A set of vectors, {ui, • • • ,u n } is said to be an orthonormal set if 13.1. 

Theorem 13.1.5 If {ui, • • • , u m } is an orthonormal set of vectors then it is linearly independent. 

Proof: Using the properties of the dot product, 

0u = (0 + 0)u = 0u + 0u 

and so, subtracting • u from both sides yields • u = 0. Now suppose V • CjUj = 0. Then from 
the properties of the dot product, 



Ck 



J2 C 3 S Jk = Yl °J K' ' U k)= \Y1 °i U i • life = • U fc = 0. 



Since k was arbitrary, this shows that each c k = and this has shown that if J^ ■ CjUj = 0, then 
each Cj =0. This is what it means for the set of vectors to be linearly independent. ■ 



Example 13.1.6 Let U 



( -±- ±- 



1 -1 

Vs V2 



7a N 

1 
V6 



71 ° 



V6 
3 



Is U an orthogonal matrix? 



The answer is yes. This is because the columns (rows) form an orthonormal set of vectors. 

The importance of orthogonal matrices is that they change components of vectors relative to 
different Cartesian coordinate systems. Geometrically, the orthogonal matrices are exactly those 
which preserve all distances in the sense that if x G M n and U is orthogonal, then \\Ux\\ = ||x|| 
because 

||/7x|| 2 = (*7x) T C/x = x T /7 T /7x = x T /x = ||x|| 2 . 
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Observation 13.1.7 Suppose U is an orthogonal matrix. Then det (U) = ±1. 

This is easy to see from the properties of determinants. Thus 

det (llf = det (U T ) det (U) = det (U T U) = det (I) = 1. 

Orthogonal matrices are divided into two classes, proper and improper. The proper orthogonal 
matrices are those whose determinant equals 1 and the improper ones are those whose determinants 
equal — 1. The reason for the distinction is that the improper orthogonal matrices are sometimes 
considered to have no physical significance since they cause a change in orientation which would 
correspond to material passing through itself in a non physical manner. Thus in considering which 
coordinate systems must be considered in certain applications, you only need to consider those 
which are related by a proper orthogonal transformation. Geometrically, the linear transformations 
determined by the proper orthogonal matrices correspond to the composition of rotations. 

13.1.2 Symmetric And Skew Symmetric Matrices 

Definition 13.1.8 A real nx n matrix A, is symmetric if A T = A. If A = —A T , then A is called 
skew symmetric. 

Theorem 13.1.9 The eigenvalues of a real symmetric matrix are real. The eigenvalues of a real 
skew symmetric matrix are or pure imaginary. 

Proof: The proof of this theorem is in [11]. It is best understood as a special case of more 
general considerations. However, here is a proof in this special case. 

Recall that for a complex number a + ib, the complex conjugate, denoted by a + ib is given by 
the formula a + ib = a — ib. The notation, x will denote the vector which has every entry replaced 
by its complex conjugate. 

Suppose A is a real symmetric matrix and Ax = Ax. Then 

Ax T x = (Ax) x = x T A T x = x T Ax = Ax T x. 

Dividing by x T x on both sides yields A = A which says A is real. (Why?) 
Next suppose A = —A T so A is skew symmetric and Ax = Ax. Then 

Ax T x = (Ax) x = x T A T x = -x T Ax = -Ax T x 

and so, dividing by x T x as before, A = —A. Letting A = a + ib, this means a — ib = —a — ib and so 
a = 0. Thus A is pure imaginary. ■ 

Example 13.1.10 Let A = I ) . This is a skew symmetric matrix. Find its eigenvalues. 

Its eigenvalues are obtained by solving the equation det ( > ) = A + 1 = 0. You see the 

eigenvalues are ±i, pure imaginary. 

Example 13.1.11 Let A= I J . This is a symmetric matrix. Find its eigenvalues. 

Its eigenvalues are obtained by solving the equation, det ( .)=— 1 — 4A + A=0 

and the solution is A = 2 + V^ and A = 2 — \/h. 
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Definition 13.1.12 An n x n matrix A = (a^) is called a diagonal matrix if ai 



whenever 



i 7^ j. For example, a diagonal matrix is of the form indicated below where * denotes a number. 

( * ••• \ 

* : 



\0 



•- 
* / 



Theorem 13.1.13 Let A be a real symmetric matrix. Then there exists an orthogonal matrix U 
such that U T AU is a diagonal matrix. Moreover, the diagonal entries are the eigenvalues of A. 

Proof: The proof is given later. 

Corollary 13.1.14 If A is a real n x n symmetric matrix, then there exists an orthonormal set of 
eigenvectors, {ui, • • • , u n } . 

Proof: Since A is symmetric, then by Theorem 13.1.13, there exists an orthogonal matrix U 
such that U T AU = D, a diagonal matrix whose diagonal entries are the eigenvalues of A. Therefore, 
since A is symmetric and all the matrices are real, 



D = D T = U T A T U = U 1 A 1 U = U 1 AU = D 
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showing D is real because each entry of D equals its complex conjugate. 1 
Finally, let 

U = ( ui u 2 • • • u n ) 

where the u ? denote the columns of U and 







D 



The equation, U T AU = D implies 

AU = ( Aui Au 2 ■ • • Au n ) 

= UD=( Aiui A 2 u 2 ••• A n u n ) 

where the entries denote the columns of AU and UD respectively. Therefore, Aui = A^u^ and since 
the matrix is orthogonal, the ij th entry of U T U equals Sij and so 

Sij = ufuj = Ui • u^-. 



1 Recall that for a complex number, x + iy, the complex conjugate, denoted by x + iy is defined as x — iy. 
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This proves the corollary because it shows the vectors {u^} form an ortho normal basis. ■ 
Example 13.1.15 Find the eigenvalues and an orthonormal basis of eigenvectors for the matrix 

( f ~h^> ^V5\ 
^ -I 



V 



15 



i^ 



16 
" 15 



16 

15 



94 
45 



/ 



given that the eigenvalues are 3,-1, and 2. 

The augmented matrix which needs to be row reduced to find the eigenvectors for A = 3 is 



/ f -3 


"^5 


h^> 


o\ 


-£V5 


4-3 


16 
15 


o 


£V5 


16 
15 


94 o 
45 6 


o 



and the row reduced echelon form for this is 




Therefore, eigenvectors for A = 3 are 



:V5 



iV5\ 

_3 

4 

i J 




0/ 



where z ^ 0. 

The augmented matrix, which must be row reduced to find the eigenvectors for A 

/ 



-1, is 



f+l 


~h^ 


h^ 


1 o\ 


~h^> 


-1 + 1 


16 
15 


1 o 


iV5 


16 
15 


94, j 

45 ^ X 


1 o 



and the row reduced echelon form is 




■§vs 



Therefore, the eigenvectors for A = — 1 are 



3 
1 



^0 
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The augmented matrix which must be row reduced to find the eigenvectors for A = 2 is 



/ 



19 9 

9 Z 


-h^ 


h^ 


o\ 


-^ 


4-2 


16 
15 


o 


h^> 


16 
15 


94 9 

45 Z 


o 



and its row reduced echelon form is 




|V5 






so the eigenvectors for A = 2 are 



-IV5 


1 



z^O. 



It remains to find an orthonormal basis. You can check that the dot product of any of these 
vectors with another of them gives zero and so it suffices choose z in each case such that the resulting 
vector has length 1. First consider the vectors for A = 3. It is required to choose z such that 

_3 

4 



is a unit vector. In other words, you need 



§V5 

_3 

4 



§V5 

_3 

4 



But the above dot product equals y|z 2 and this equals 1 when z = A a/5. Therefore, the eigenvector 



15 



which is desired is 



15 



\/5 



/ 1^5 \ 


( 5 


"* = 


-|V5 


V i ) 


i, T 5 ^ 



Next find the eigenvector for A 



-1. The same process requires that 1 = ^-z 2 which happens 



when z = A\/5. Therefore, an eigenvector for A = — 1 which has unit length is 



■>V5\ 



15 



:V5 



( 



1 

3 

^V5 



\ 



Finally, consider A = 2. This time you need 1 — |z 2 which occurs when z = |>/5. Therefore, the 



Download free eBooks at bookboon.com 



324 



Elementary Linear Algebra 



Matrices And The Inner Product 



eigenvector is 




|V5 



Now recall that the vectors form an orthonormal set of vectors if the matrix having them as 
columns is orthogonal. That matrix is 



/ 



V 



2 
3 


1 
3 


2 
3 


-$V5 


§V5 





^ 


h^ 


|VS 



\ 



/ 



Is this orthogonal? To find out, multiply by its transpose. Thus 



/ 



V 



;V5 



§V5 



£V5\ 



&V5 



IVS 



/ 



V 



2 
3 

^V5 



1 

3 



r\/5 




/ 



Since the identity was obtained this shows the above matrix is orthogonal and that therefore, the 
columns form an orthonormal set of vectors. The problem asks for you to find an orthonormal basis. 
However, you will show in Problem 23 that an orthonormal set of n vectors in R n is always a basis. 
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Therefore, since there are three of these vectors, they must constitute a basis. 
Example 13.1.16 Find an orthonormal set of three eigenvectors for the matrix 



( 



13 

9 

&V5 



45 



V5 



&V5 

6 

5 

_4_ 

15 



45 



V^ \ 



15 



61 

45 



/ 



given the eigenvalues are 2, and 1. 

The eigenvectors which go with A = 2 are obtained from row reducing the matrix 



/f-2 


^V5 


J^ 


o 


&V5 


1-2 


4 
15 


o 


, &V5 


4 
15 


61 o 

45 Z 


o 
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and its row reduced echelon form is 

I l ° 

1 

V o o 

which shows the eigenvectors for A = 2 are 



rv^ 



/ \^ 



V i 

and a choice for z which will produce a unit vector is z = ^V^- Therefore, the vector we want is 



1 \ 

3 \ 



|^5 



/ 



Next consider the eigenvectors for A = 1. The matrix which must be row reduced is 



13 



1 &VE l 5 V5 



ftV5 |-1 



15 



45 



V5 



J_ 

15 



61 

45 



o 

o 
/ 



and its row reduced echelon form is 



1 ^ |V5 



Therefore, the eigenvectors are of the form 



-^y/5y-lVEz 

y 

z 

This is a two dimensional eigenspace. 

Before going further, we want to point out that no matter how we choose y and z the resulting 
vector will be orthogonal to the eigenvector for A = 2. This is a special case of a general result 
which states that eigenvectors for distinct eigenvalues of a symmetric matrix are orthogonal. This 
is explained in Problem 15. For this case you need to show the following dot product equals zero. 



/ 



2 
3 

\r 5 ^ 



\ 



( 



3 V5y 



10 



!V5* 



y 

z 



(13.2) 



This is left for you to do. 
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Continuing with the task of finding an orthonormal basis, Let y = first. This results in 
eigenvectors of the form 



-IV5z\ 



J 



and letting z = \y/E you obtain a unit vector. Thus the second vector will be 



f -§V5(§>/s) \ 


/ - ? 

' 3 








^ ^ 


^ 5^ 



It remains to find the third vector in the orthonormal basis. This merely involves choosing y and 
z in 13.2 in such a way that the resulting vector has dot product with the two given vectors equal 
to zero. Thus you need 



-V&y 



10 



y 

z 



\V5z\ 



J 



\ 



;V5 



■zVhy+ -Vbz = 0. 



The dot product with the eigenvector for A = 2 is automatically equal to zero and so all that you 
need is the above equation. This is satisfied when z = — |y. Therefore, the vector we want is of the 

form 

V§w-§V5(-i») \ /-|VSw 



10 



7 



y 

-b 



y 

(-§») 

and it only remains to choose y in such a way that this vector has unit length. This occurs when 
y = |\/5. Therefore, the vector we want is 



;>/5 



■ivs 



_i 

3 



The three eigenvectors which constitute an orthonormal basis are 



I "5 ^ 


/ "i \ 




f f "\ 


§V5 


• ° 


, and 


§V5 


V-i^v^J 


U^/ 




\&^1 



To check our work and see if this is really an orthonormal set of vectors, we make them the 
columns of a matrix and see if the resulting matrix is orthogonal. The matrix is 



/ 



_! 

3 

\V5 



\ 



iVE 



&VE \y/l ±y/E 
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This matrix times its transpose equals 



/ 



V 



_1 

3 



|V5 



;\/5 



^ 



/ 



V 



-I §vs 


"^5\ 




-1 o 


|V5 


/ 1 
= 010 


1 §V5 


^^5 


\ 1 



/ 



and so this is indeed an orthonormal basis. 

Because of the repeated eigenvalue, there would have been many other orthonormal bases which 
could have been obtained. It was pretty arbitrary for to take y = in the above argument. We 
could just as easily have taken z = or even y = z = 1. Any such change would have resulted in a 
different orthonormal basis. Geometrically, what is happening is the eigenspace for A = 1 was two 
dimensional. It can be visualized as a plane in three dimensional space which passes through the 
origin. There are infinitely many different pairs of perpendicular unit vectors in this plane. 

13.1.3 Diagonalizing A Symmetric Matrix 

Recall the following definition: 



Definition 13.1.17 An n x n matrix A = (a^) is called a diagonal matrix if ai 



whenever 



i ^ j. For example, a diagonal matrix is of the form indicated below where * denotes a number. 



(* 







°) 





* 






u 









* J 



Definition 13.1.18 An n x n matrix A is said to be non defective or diagonalizable if there 
exists an invertible matrix S such that S~ r AS = D where D is a diagonal matrix as described above. 

Some matrices are non defective and some are not. As indicated in Theorem 13.1.13 if A is a real 
symmetric matrix, there exists an orthogonal matrix U such that U T AU = D a diagonal matrix. 
Therefore, every symmetric matrix is non defective because if U is an orthogonal matrix, its inverse 
is U T . In the following example, this orthogonal matrix will be found. 



Example 13.1.19 Let A 



matrix. 



Z 1 





V 







2 

3 

2 



Find an orthogonal matrix U such that U T AU is a 



In this case, a tedious computation shows the eigenvalues are 2 and 1. First we will find an 
eigenvector for the eigenvalue 2. This involves row reducing the following augmented matrix. 



f 1 






The row reduced echelon form is 
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_1 
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and so an eigenvector is 



However, it is desired that the eigenvectors obtained all be unit vectors and so dividing this vector 
by its length gives 



l/x/2 

1/V2 

Next consider the case of the eigenvalue, 1. The matrix which needs to be row reduced in this case 
is 

/ 1 \ 



u - 1 2 2 







.1 1-3 

2 x 2 
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The row reduced echelon form is 



Therefore, the eigenvectors are of the form 



1 1 


o 





o 





o 



Two of these which are orthonormal are 





and 



An orthogonal matrix which works in the process is then obtained by letting these vectors be the 

columns. 

1 

-\jyj2 l/>/2 

l/>/2 l/S/2 



It remains to verify this works. U T AU is of the form 



(° 

i 

r 



•|V2 \y/2 \ / 1 \ 

3 ' 











§V5 |V5 



/ 




o 1 o 

-l/\/2 l/>/2 
l/x/2 1/V2 



the desired diagonal matrix. 



13.2 Fundamental Theory And Generalizations 

13.2.1 Block Multiplication Of Matrices 

Consider the following problem 



A B 
C D 



E F 
G H 



You know how to do this. You get 



AE + BG AF + BH 
CE + DG CF + DH 

Now what if instead of numbers, the entries, A, 5, C, D, E, F, G are matrices of a size such that the 
multiplications and additions needed in the above formula all make sense. Would the formula be 
true in this case? I will show below that this is true. 
Suppose A is a matrix of the form 



A = 



Mi 



\ A rl 



A lr , 



(13.3) 
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where A{j is a Si x pj matrix where Si is constant for j = 1, • • • , m for each i = 1, • • • , r. Such a 
matrix is called a block matrix, also a partitioned matrix. How do you get the block Aifi Here 
is how for A an m x n matrix: 

nxpj 







\ l Si xsi J A | -LpjXpj 





(13.4) 



In the block column matrix on the right, you need to have Cj — 1 rows of zeros above the small pj x pj 



identity matrix where the columns of A involved in A{ 



-Pj 



1 and in the block row 



matrix on the left, you need to have T{ — \ columns of zeros to the left of the S{ x S{ identity matrix 



where the rows of A involved in An are 



An important observation to make is that 



the matrix on the right specifies columns to use in the block and the one on the left specifies the 
rows used. Thus the block Aij in this case is a matrix of size Si x pj. There is no overlap between 
the blocks of A. Thus the identity nxn identity matrix corresponding to multiplication on the right 
of A is of the form 



lh 



PlXPl 







\ 



\ ^ IpmXPm / 

these little identity matrices don't overlap. A similar conclusion follows from consideration of the 
matrices I SiX si- 

Next consider the question of multiplication of two block matrices. Let B be a block matrix of 
the form 



and A is a block matrix of the form 




B 



i P 



(13.5) 



B r 



Aim \ 



J 



(13.6) 



and that for all z,j, it makes sense to multiply B is A S j for all s G {1, • • • ,p}. (That is the two 
matrices, B is and A S j are conformable.) and that for fixed ij, it follows B is A S j is the same size for 



sj- 



each s so that it makes sense to write ^2 S Bi s A 

The following theorem says essentially that when you take the product of two matrices, you can 
do it two ways. One way is to simply multiply them forming B A. The other way is to partition 
both matrices, formally multiply the blocks to get another block matrix and this one will be BA 
partitioned. Before presenting this theorem, here is a simple lemma which is really a special case of 
the theorem. 

Lemma 13.2.1 Consider the following product. 



I 0) 



where the first is n x r and the second is r x n. The small identity matrix I is an r x r matrix and 
there are I zero rows above I and I zero columns to the left of I in the right matrix. Then the product 
of these matrices is a block matrix of the form 
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Proof: From the definition of the way you multiply matrices, the product is 





which yields the claimed result. In the formula e^ refers to the column vector of length r which has 
a 1 in the j th position. ■ 

Theorem 13.2.2 Let B be a q x p block matrix as in 13.5 and let A be a p x n block matrix as in 
13.6 such that Bi S is conformable with A S j and each product, Bi S A S j for s = 1, • • • ^p is of the same 
size so they can be added. Then BA can be obtained as a block matrix such that the ij th block is of 
the form 

Y J B ls A SJ . (13.7) 

s 

Proof: From 13.4 

o \ / o 

-L>isA s j = y U l ri xn J -o I lp s xp s I \ U lp s xp s U /I ( ij X( ij 

o / V o 

where here it is assumed Bi S is Ti x p s and A S j is p s x qj . The product involves the s th block in the 
V th row of blocks for B and the s th block in the j th column of A. Thus there are the same number of 
rows above the I PsX p s as there are columns to the left of I PsX p s in those two inside matrices. Then 
from Lemma 13.2.1 

o \ 

Ip s Xp s I ( Ip s Xp s ) ~ 

o / 

Since the blocks of small identity matrices do not overlap, 















J-PsXPs 















\ / fpixpi 







^ t o i PsXPs o I - | •. | =/ 

u / \ q / 



and so 




5^(0 I riXri 0)5 I I PsXPs 1(0 I PsXPs 0)A[ I qjXqj 





IjXi 





( I riXri )BJ2 I tpsxps ] ( o I PsXPs ) A ( J^. 
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( I riXri )BIA ( I qjXqj 







( I riXri )BA \ I qjXqj 





which equals the ij th block of BA. Hence the ij th block of BA equals the formal multiplication 
according to matrix multiplication, 

/ Bi s A s j. m 



Example 13.2.3 Let an n x n matrix have the form 

a b 



A 



where P is n — 1 x n — 1. Multiply it by 



B 



c P 



p q 

r Q 



where B is also an n x n matrix and Q is n — 1 x n — 1 . 
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You use block multiplication 

a b \ / p q \ __ / ap + br aq + hQ 
c P J \r Q J ~ \pc + Pr cq + PQ 

Note that this all makes sense. For example, b = lxn-1 and r = n — 1 x 1 so br is a 1 x 1. Similar 
considerations apply to the other blocks. 

Here is an interesting and significant application of block multiplication. In this theorem, pm (t) 
denotes the characteristic polynomial, det (tl — M) . Thus the zeros of this polynomial are the eigen- 
values of the matrix M. 

Theorem 13.2.4 Let A be an m x n matrix and let B be an n x m matrix for m < n. Then 

Pba (t) = t n - m p AB (t) , 

so the eigenvalues of BA and AB are the same including multiplicities except that BA has n — m 
extra zero eigenvalues. 

Proof: Use block multiplication to write 

AB \ f I A\ f AB ABA 
B J \0 I J ~ \ B BA 

I A\ ( \ f AB ABA 
I { B BA J { B BA 



Therefore, 



IA\ 1 fABO\fIA\ ( 



1) V B ) V I ) \ B BA 



Since the two matrices above are similar it follows that ( „ ua) anc ^ ( R n ) ^ ave ^^ e 

same characteristic polynomials. Therefore, noting that BA is an n x n matrix and AB is an m x m 
matrix, 

t m det (tl - BA) = t n det (tl - AB) 

and so det (tl - BA) = p BA (t) = t n - m det (tl - AB) = t n - m p AB (t). ■ 

13.2.2 Orthonormal Bases, Gram Schmidt Process 

Not all bases for F n are created equal. Recall F equals either C or R and the dot product is given 
by 

x • y = (x, y) = (x, y) = J2 x jW' 

3 

The best bases are orthonormal. Much of what follows will be for F n in the interest of generality. 
Definition 13.2.5 Suppose {vi, • • • , v^} is a set of vectors in F n . It is an orthonormal set if 

r f 1 if i = j 

Every orthonormal set of vectors is automatically linearly independent. 

Proposition 13.2.6 Suppose {yi,--- , v^} is an orthonormal set of vectors. Then it is linearly 
independent. 
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Proof: Suppose J2 i=1 ciVi = 0. Then taking dot products with Vj, 



= • Vj = ^2 c i v i ■ Vj = ^2 c ^i 



Since j is arbitrary, this shows the set is linearly independent as claimed. ■ 

It turns out that if X is any subspace of F m , then there exists an orthonormal basis for X. This 
follows from the use of the next lemma applied to a basis for X. 

Lemma 13.2.7 Let {xi,- — ,x n } be a linearly independent subset o/F p , p > n. Then there exist 
orthonormal vectors {ui, • • • , u n } which have the property that for each k < n, span(xi, • • • , x^) = 
span(ui,--- ,u fc ). 

Proof: Let ui = xi/ |xi| . Thus for k — 1, span(ui) = span(xi) and {ui} is an orthonormal 
set. Now suppose for some k < n, Ui, ■•■, u^ have been chosen such that (uj,U/) = Sji and 
span (xi, • • • , Xfe) = span (ui, • • • , U&). Then define 

Ufc+1 = , (Id.oJ 



x fc+i - Ej=i ( x fc+i • Uj) u 3 
where the denominator is not equal to zero because the Xj form a basis, and so 

Xfc+i ^ span (xi, • • • , x fc ) = span (ui, • • • , u fe ) 
Thus by induction, 

u fe +i € span(ui,--- ,u fe ,x fe+ i) = span (x x , • • • ,x fc ,x fc+ i). 
Also, Xfc+i G span (ui, • • • , u/c, Ufc+i) which is seen easily by solving 13.8 for x^+i and it follows 

span(xi,--- ,Xfc,Xfc+i) = span(m,--- ,Ufc,Ufe+i). 
If I < k, 

(Ufc+i -III) = C (Xfc + i • U^ - ^ (x fc+ i • Uj) (Uj • Ui) 

( 

= C (xfc+i • u/) - ^ (xfc+i • Uj) ^j 

= C ((xfc+i • u/) - (xfc+i • u/)) = 0. 

The vectors, {uj} n =1 , generated in this way are therefore orthonormal because each vector has unit 
length. ■ 

The process by which these vectors were generated is called the Gram Schmidt process. Note 
that from the construction, each x/c is in the span of {ui, • • • , U&}. In terms of matrices, this says 

(xi---x n ) = (m---u n )i? 

where R is an upper triangular matrix. This is closely related to the QR factorization discussed 
earlier. It is called the thin QR factorization. If the Gram Schmidt process is used to enlarge 
{ui • • • u n } to an orthonormal basis for F m , {ui • • • u n , u n +i, • • • , u m } then if Q is the matrix which 
has these vectors as columns and if R is also enlarged to R' by adding in rows of zeros, if necessary, 
to form an m x n matrix, then the above would be of the form 

(Xi---X n ) = (\li ■ ■ ■ \l m ) R' 

and you could read off the orthonormal basis for span (xi • • • x n ) by simply taking the first n columns 
of Q = (ui • • -u m ). This is convenient because computer algebra systems are set up to find QR 
factorizations. 
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Example 13.2.8 Find an orthonormal basis for span 



This is really easy to do using a computer algebra system. 



1 1 -^- 

506 

11 — 9 - 
L± 506 

IT -±- 

1 ' 253 



11V46 

ITVT6 

llx/46 



46 

J_ 

46 
__3_ 
23 



46 
46 
46 



and so the desired orthonormal basis is 



11 
IT 
IT 



5MVTV46 
-5§6VW46 

^vTTvTe 



11 



I^VTT 

o ^-vTivTe 

o o 



A^cHB TfJ - 






-cf-- 
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/U«*/.«- -\) SCrChe 



ty ^^^^/^te 



^•w^ 
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13.2.3 Schur's Theorem 

Every matrix is related to an upper triangular matrix in a particularly significant way. This is Schur's 
theorem and it is the most important theorem in the spectral theory of matrices. The important 
result which makes this theorem possible is the Gram Schmidt procedure of Lemma 13.2.7. 

Definition 13.2.9 An nxn matrix U, is unitary ifUU* = I = U*U where U* is defined to be the 
transpose of the conjugate of U. Thus U{j = U^. Note that every real orthogonal matrix is unitary. 
For A any matrix A* just defined as the conjugate of the transpose is called the adjoint. 



Note that if U = ( vi • • • v n ) where the v& are orthonormal vectors in C n , then U is unitary. 
This follows because the ij th entry of U*U is vjvj = 5%j since the v^ are assumed orthonormal. 

Lemma 13.2.10 The following holds. (AB)* = B* A* . 

Proof: From the definition and remembering the properties of complex conjugation, 



((AB)*)^ = {AB) tJ 

= / AifclDkj = / AjfclDkj 

k k 

= J2 B h A li = ( B * A *)ji ■ 

k 

Theorem 13.2.11 Let A be an n x n matrix. Then there exists a unitary matrix U such that 

U*AU = T, (13.9) 

where T is an upper triangular matrix having the eigenvalues of A on the main diagonal listed 
according to multiplicity as roots of the characteristic equation. 

Proof: The theorem is clearly true if A is a 1 x 1 matrix. Just let U = 1, the lxl matrix which 
has entry 1. Suppose it is true for (n — 1) x (n — 1) matrices and let A be an n x n matrix. Then 
let vi be a unit eigenvector for A. Then there exists Ai such that 

Avi = Aivi, |vi| = 1. 

Extend {vi} to a basis and then use the Gram - Schmidt process to obtain {vi, • • • , v n }, an or- 
thonormal basis of C n . Let Uo be a matrix whose i th column is v^ so that Uo is unitary. Also 
UqAUo is of the form 

/ Ai * • • • * \ 


'• A 1 

Vo J 

where A\ is an n — 1 x n — 1 matrix. Now by induction, there exists an (n — 1) x (n — 1) unitary 
matrix U\ such that 

U^A 1 U 1 = T n _i, 



an upper triangular matrix. Consider 



Ux = 



1 

Ui 
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An application of block multiplication shows that U± is a unitary matrix and also that 

UiUSAUoU, = (J ~. 

Ai * , = T 



Ai * 
A 1 



1 

U x 



where T is upper triangular. Then let U = UoUi. Since (UqUi)* = U*Uq, it follows that A is similar 
to T and that UqU\ is unitary. Hence A and T have the same characteristic polynomials, and since 
the eigenvalues of T are the diagonal entries listed with multiplicity, this proves the theorem. ■ 
As a simple consequence of the above theorem, here is an interesting lemma. 

Lemma 13.2.12 Let A be of the form 



where P^ is an rrik x m^ matrix. Then 



det(A) = JJdet(Pib). 



Proof: Let Uk be an rrik x m^ unitary matrix such that 

U* k P k U k =T k 
where T k is upper triangular. Then letting 

C7i • • • 



U 



it follows 



U* = 








and 



U{ 



u: 



Pi 





\ 



U 8 J 



and so 



• • • T s J 
det (A) = Y[ det (T fe ) = JJ det (P fc ) 



Definition 13.2.13 ^4n n x n matrix A is called Hermitian if A = A*. Thus a real symmetric 
matrix is Hermitian. 
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Theorem 13.2.14 If A is Hermitian, there exists a unitary matrix U such that 

U*AU = D (13.10) 

where D is a real diagonal matrix. That is, D has nonzero entries only on the main diagonal and 
these are real. Furthermore, the columns of U are an orthonormal basis for ¥ n . 

Proof: From Schur's theorem above, there exists U unitary such that 

U*AU = T 

where T is an upper triangular matrix. Then from Lemma 13.2.10 

T* = (U*AU)* = U*A*U = T. 

Thus T = T* and T is upper triangular. This can only happen if T is really a diagonal matrix 
having real entries on the main diagonal. (If i ^ j, one of T^ or Tji equals zero. But T^ = Tji and 
so they are both zero. Also Ta = Ta.) 
Finally, let 

U = ( ui u 2 • • • u n ) 

where the u^ denote the columns of U and 



D 



V o 



The equation, U*AU = D implies 



AU = ( -Aui Au 2 • • • Au n ) 

= UD = ( Aiui A 2 u 2 • • • A n u n ) 

where the entries denote the columns of AU and UD respectively. Therefore, Aui = A^u^ and since 
the matrix is unitary, the ij th entry of U*U equals Sij and so 



5 i:j =ufuj =ufuj =UiUj. 

This proves the corollary because it shows the vectors {u^} form an orthonormal basis. ■ 

Corollary 13.2.15 If A is a real symmetric (A = A T ) matrix, then A is Hermitian and there exists 
a real unitary matrix U such that U T AU = D where D is a diagonal matrix. 

Proof: This follows from Theorem 13.2.14 which says the eigenvalues are all real. Then if 
Ax = Ax, the same is true of x. and so in the construction for Schur's theorem, you can always 
deal exclusively with real eigenvectors as long as your matrices are real and symmetric. When you 
construct the matrix which reduces the problem to a smaller one having A\ in the lower right corner, 
use the Gram Schmidt process on R n using the real dot product to construct vectors, v 2 , • • • , v n in 
R n such that {vi, • • • , v n } is an orthonormal basis for R n . The matrix A\ is symmetric also. This 
is because for j, k > 2 

A ikj = v^ Awj = (vlAvj) = vJAv k = A ljk . 

Therefore, continuing this way, the process of the proof delivers only real vectors and real matrices. 
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13.3 Least Square Approximation 

A very important technique is that of the least square approximation. 

Lemma 13.3.1 Let A be an m x n matrix and let A (¥ n ) denote the set of vectors in ¥ m which are 
of the form Ax for some x E ¥ n . Then A (¥ n ) is a subspace of¥ m . 



Proof: Let Ax and Ay be two points of A (¥ n ) . It suffices to verify that if a, b are scalars, then 
aAx + bAy is also in A (¥ n ) . But a Ax + b Ay = A (ax + by) because A is linear. ■ 

There is also a useful observation about orthonormal sets of vectors which is stated in the next 
lemma. 

Lemma 13.3.2 Suppose {xi,X2,--- ,x r } is an orthonormal set of vectors. Then z/ci,--- , c r are 

scalars, 



E 



CfcX/e 



Ei 

fc=i 



Ck\ 




m 



rfe 
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Proof: This follows from the definition. From the properties of the dot product and using the 
fact that the given set of vectors is orthonormal, 



y^cfcx/e 



fc=i 






k,j 



k=l 



The following theorem gives the equivalence of an orthogonality condition with a minimization 
condition. 

Theorem 13.3.3 Let y G F m and let A be an m x n matrix. Then there exists x G F n minimizing 
the function xh^ |y— Ax\ . Furthermore, x minimizes this function if and only if 

(y-Ax) • Aw = 

for all w GF n . 

Proof: Let {Ax/e} be an orthonormal basis for A¥ n . Next note that for a given y, 

/ r \ r 

y- J2 ( A *k,y) Ax k ,Ax 3 - = (y,A Xj ) - ^ (Ax fc ,y) (Ax fc , A Xj ) = 0. 



k=i 



fe=i 



In particular 



y-Ai ^(Ax^yjx,, j ' w | =0 



vjfe=i 



for all w G AF n since {ixj is a basis. Also note that by Lemma 13.3.2, 



^2a k Ax k 



k=i 






because the {ixj are orthonormal. Therefore, letting 

r 

x = ^(Ax fc ,y)x fc , 



fe=i 



y-5Z^^ Xfe 



fc=i 



y-Ax + ^ ((i4x fc , y) - 2/fe) Ax fe 



fe=i 



|y-^x| 



y--4x, ^ ((Ax fc , y) - 2/ fe ) ix fe 



fe=i 



^((Ax fe ,y) -^Ax* 



fe=i 



|y-Ax| 2 + ^|(Ax fe ,y) 



■2/fe| 



fc=i 



It follows that the minimum exists and occurs when y k — (Axfc,y) for each k. The above shows 
that Ax is the closest vector in A¥ n to y and that (y— Ax, w) = for all w G AF n . 
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Now suppose x is such that (y— Ax, w) = for all w G A¥ n . Why is Ax as close as possible to 
y? Letting z G F n , 

\y-Az\ 2 = |y-Ax + A(x-z)| 2 



= |y - Ax| 2 + \A (x - z)| 2 + (y - A^A (x - z)) 

and so the smallest value of |y — Az\ is obtained when Ax = Az. Thus Ax is as close as possible to 
y if the orthogonality condition holds. ■ 

Recall the definition of the adjoint of a matrix. 

Definition 13.3.4 Let A be an m x n matrix. Then 



A* = (A T ). 

This means you take the transpose of A and then replace each entry by its conjugate. This matrix 
is called the adjoint. Thus in the case of real matrices having only real entries, the adjoint is just 
the transpose. 

Lemma 13.3.5 Let A be an m x n matrix. Then 

Ax • y = x-A*y 

Proof: This follows from the definition. 

Ax-y = ^AijXjYi 

= x-A*y. ■ 
The next corollary gives the technique of least squares. 

Corollary 13.3.6 A value of x which solves the problem of Theorem 13.3.3 is obtained by solving 
the equation 

A* Ax = A*y 

and furthermore, there exists a solution to this system of equations. 

Proof: For x the minimizer of Theorem 13.3.3, (y— Ax) • Aw = for all w G F n and from 
Lemma 13.3.5, this is the same as saying 

A* (y-Ax) • w = 

for all w G F n . This implies 

A*y - A^Ax = 0. 

Therefore, there is a solution to the equation of this corollary, and it solves the minimization problem 
of Theorem 13.3.3. ■ 

Note that x might not be unique but Ax, the closest point of A¥ n to y is unique. This was 
shown in the above argument. Sometimes people like to consider the x such that Ax is as close as 
possible to y and also |x| is as small as possible. It turns out that there exists a unique such x and 
it is denoted as A + y. However, this is as far as I will go with this in this part of the book. 
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13.3.1 The Least Squares Regression Line 

For the situation of the least squares regression line discussed here I will specialize to the case of R n 
rather than F n because it seems this case is by far the most interesting and the extra details are not 
justified by an increase in utility. Thus, everywhere you see A* it suffices to place A T . 

An important application of Corollary 13.3.6 is the problem of finding the least squares regression 
line in statistics. Suppose you are given points in xy plane 

{( x i>yi)Yi=i 

and you would like to find constants m and b such that the line y = mx + b goes through all these 
points. Of course this will be impossible in general. Therefore, try to find m, b to get as close as 
possible. The desired system is 



{ Vi 



Xi 




\ Vn J \ x n -L 

which is of the form y = Ax and it is desired to choose m and b to make 



m 
b 



yi \ 



Vn J 



as small as possible. According to Theorem 13.3.3 and Corollary 13.3.6, the best values for m and 
b occur as the solution to 

/ Vi 
--A T 



A 1 A 



m 
b 



\ Vn 



where 



X\ 1 



Thus, computing A T A, 



En 2 v - ^ 77, 

z— 1 X i l^i-1 X i 

n 



m 
b 



En 
i—l x iUi 
En 
i=lVi 



En 
i=l X ' 

Solving this system of equations for m and 6, 

(EILi x j) (E7=i vj) + (£r=i x *y*) 



m 



and 



- (E?=i x j) E?=i Wi + (E?=i vj) E?=i x l 



(E^i^)n-(E^i^) 2 

One could clearly do a least squares fit for curves of the form y = ax 2 + bx + c in the same way. 
In this case you want to solve as well as possible for a, 6, and c the system 

c? Xi 1 



2 

X rn X <Y~ 
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and one would use the same technique as above. Many other similar problems are important, 
including many in higher dimensions and they are all solved the same way. 



13.3.2 The Fredholm Alternative 

The next major result is called the Fredholm alternative. It comes from Theorem 13.3.3 and Lemma 
13.3.5. 

Theorem 13.3.7 Let A be an m x n matrix. Then there exists x E ¥ n such that Ax = y if and 
only if whenever A*z = it follows that z • y = 0. 

Proof: First suppose that for some x E F n , Ax = y. Then letting A*z = and using Lemma 
13.3.5 

y • z = Ax • z = x • A*z = x • = 0. 

This proves half the theorem. 

To do the other half, suppose that whenever, A*z = it follows that z • y = 0. It is necessary 
to show there exists x E F n such that y = Ax. From Theorem 13.3.3 there exists x minimizing 
|y — Ax | which therefore satisfies 



(y - Ax) • Aw = 



(13.11) 
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for all w E ¥ n . Therefore, for all w G F n , 

A* (y - Ax) ■ w = 

which shows that A* (y — Ax) = 0. (Why?) Therefore, by assumption, 

(y - Ax) • y = 0. 

Now by 13.11 with w = x, 

(y - Ax) • (y-Ax) = (y - Ax) • y- (y - Ax) -Ax = 

showing that y = Ax. ■ 

The following corollary is also called the Fredholm alternative. 

Corollary 13.3.8 Let A be an m x n matrix. Then A is onto if and only if A* is one to one. 

Proof: Suppose first A is onto. Then by Theorem 13.3.7, it follows that for all y E F m , y • z = 
whenever A*z = 0. Therefore, let y = z where A*z = and conclude that z • z = whenever A*z = 
0. If A*x = A*y, then A* (x - y) = and so x - y = 0. Thus A* is one to one. 

Now let y E ¥ m be given, y • z = whenever A*z = because, since A* is assumed to be one 
to one, and is a solution to this equation, it must be the only solution. Therefore, by Theorem 
13.3.7 there exists x such that Ax = y therefore, A is onto. ■ 

13.4 The Right Polar Factorization* 

The right polar factorization involves writing a matrix as a product of two other matrices, one which 
preserves distances and the other which stretches and distorts. First here are some lemmas which 
review and add to many of the topics discussed so far about adjoints and orthonormal sets and 
such things. This is of fundamental significance in geometric measure theory and also in continuum 
mechanics. Not surprisingly the stress should depend on the part which stretches and distorts. See 
[6]. 

Lemma 13.4.1 Let A be a Hermitian matrix such that all its eigenvalues are nonnegative. Then 
there exists a Hermitian matrix A 1 / 2 such that A 1 / 2 has all nonnegative eigenvalues and (A 1 / 2 ) = 
A. 

Proof: Since A is Hermitian, there exists a diagonal matrix D having all real nonnegative entries 
and a unitary matrix U such that A = U*DU. Then denote by D 1 / 2 the matrix which is obtained 
by replacing each diagonal entry of D with its square root. Thus D X I 2 D X I 2 = D. Then define 

A 1 ' 2 = U"D^ 2 U. 

Then 



(,. 



2 

/2 \ = u*D 1/2 UU*D 1/2 U = U*DU = A. 



Since D 1 ! 2 is real, 

(u*DV 2 uY = U* (D 1/2 Y (U*Y = U*D^ 2 U 

so A 1 / 2 is Hermitian. ■ 

Next it is helpful to recall the Gram Schmidt algorithm and observe a certain property stated in 
the next lemma. 

Lemma 13.4.2 Suppose {wi,--- ,w r ,v r+ i,--- ,v p } is a linearly independent set of vectors such 
that {wi, • • • , w r } is an orthonormal set of vectors. Then when the Gram Schmidt process is applied 
to the vectors in the given order, it will not change any of the wi , • • • , w r . 
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Proof: Let {ui, • • • , u p } be the orthonormal set delivered by the Gram Schmidt process. Then 
Ui = wi because by definition, ui = wi/ |wi| = wi. Now suppose Uj = Wj for all j < k < r. Then 
if k < r, consider the definition of Ufc+i. 



Ufc+i 



w fe+l - Ejil (Wfc+l, Uj) Uj 



t/c+1 



W/c+1 -E 7 =l (Wfe+ljU^OUj 



By induction, Uj = Wj and so this reduces to Wfc+i/ |wfc_|_i| = w^+i. ■ 
This lemma immediately implies the following lemma. 

Lemma 13.4.3 Let V be a subspace of dimension p and let {wi, • • • , w r } be an orthonormal set of 
vectors in V . Then this orthonormal set of vectors may be extended to an orthonormal basis for V, 

{wi,--- ,w r ,y r+ i,--- ,y p } 

Proof: First extend the given linearly independent set {wi, • • • , w r } to a basis for V and then 
apply the Gram Schmidt theorem to the resulting basis. Since {wi, • • • , w r } is orthonormal it follows 
from Lemma 13.4.2 the result is of the desired form, an orthonormal basis extending {wi, • • • , w r }. 
■ 

Here is another lemma about preserving distance. 

Lemma 13.4.4 Suppose R is an m x n matrix with m > n and R preserves distances. Then 
R*R = I. 

Proof: Since R preserves distances, |i?x| = |x| for every x. Therefore from the axioms of the dot 
product, 

|x| 2 + |y| 2 + (x )y ) + ( y) x) 

= |x + y| 2 

= (i?(x + y), J R(x + y)) 

= (i?x,i?x) + (Ry,Ry) + (i?x, Ry) + {Ry, Rx) 

= |x| 2 + |y| 2 + (i?*i?x,y) + (y, J R* J Rx) 



and so for all x, y, 
Hence for all x, y, 



(,R*i?x - x, y) + (y,R*Rx - x) = 



Re(i?*i?x-x,y) = 
Now for a x, y given, choose a G C such that 

a (R*Rx - x, y) = | (R*Rx - x, y) | 

Then 

= Re(iT#x-x,ay) = Rea (i?*,Rx - x,y) 

= |(iJ*iJx-x,y)| 

Thus \(R*Rx — x, y)| =0 for all x, y because the given x, y were arbitrary. Let y = R*Rx — x to 
conclude that for all x, 

R*Rk - x = 

which says R*R = I since x is arbitrary. ■ 

With this preparation, here is the big theorem about the right polar factorization. 
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Theorem 13.4.5 Let F be an m x n matrix where m > n. Then there exists a Hermitian n x n 
matrix U which has all nonnegative eigenvalues and an m x n matrix R which preserves distances 
and satisfies R*R = I such that 

F = RU. 

Proof: Consider F*F. This is a Hermitian matrix because 

(F*F)* = F* (F*)* = F*F 

Also the eigenvalues of the nxn matrix F*F are all nonnegative. This is because if x is an eigenvalue, 

A(x,x) = (F*Fx,x) = (Fx,Fx) > 0. 

Therefore, by Lemma 13.4.1, there exists an n x n Hermitian matrix U having all nonnegative 
eigenvalues such that 

U 2 = F*F. 



Consider the subspace U (F n ). Let {Z7xi, • • • , Z7x r } be an orthonormal basis for 

U(¥ n ) CF n . 
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Note that U (F n ) might not be all of F n . Using Lemma 13.4.3, extend to an orthonormal basis for 
allofF n , 

{f/xi,--- ,l7x r ,y r+ i,--- ,y n }. 

Next observe that {Fxi, • • • , Fx f } is also an orthonormal set of vectors in F m . This is because 

(Fx^Fx,-) = (F*Fx k ,x j ) = (U 2 x k ,x j ) 

= (C/x fe , ?7*Xj) = (C/xfc, C/xj) = Jj-fe 

Therefore, from Lemma 13.4.3 again, this orthonormal set of vectors can be extended to an or- 
thonormal basis for F m , 

{Fxi, • • • , Fx n z r+ i, • • • , z m } 

Thus there are at least as many z^ as there are y^. Now for x G F n , since 

{Uxi,--- ,/7x r ,y r+ i,--- ,y n } 
is an orthonormal basis for F n , there exist unique scalars, 

such that 

r n 

x = ^c /e /7x fc + ^ d kYk 
k=l k=r+l 

Define 

r n 

feE^c^x fc + ^ d fc z fc (13.12) 

fc=l fc=r+l 

Then also there exist scalars 6^ such that 

r 

£/x=^6 fc /7x /c 

fc=l 

and so from 13.12, 

r / r \ 

flC/x = ^ b k F Xk = F (J2 b * x k 
fc=i \fc=i / 

Is^(ELi^ fc ) = F(x)? 



F \J2 bkXk ) -F(x),F[J2 b k x k) -F(x) 



\k=l / \k=l 



= (( F * F ) (Z) 6 * x *- x ) ' (Z! 6 * x *- x ) ) 

= ( ^ 2 ( 5Z ^ Xfc ~ x ) ' ( 5Z ^ Xfc ~ x ) ) 

= 1 U I ^ feXfe - X J ,U I ^ 6/eXfc - X J J 
/ r r \ 

= ^6 fc /7x fe -/7x,^^/7x fe -/7x =0 
\fe=i fc=i / 

Therefore, F (X^=i ^/c x fe) = -F (x) and this shows 

RUx = Fx. 
From 13.12 and Lemma 13.3.2 R preserves distances. Therefore, by Lemma 13.4.4 R*R = I. 
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13.5 The Singular Value Decomposition 

In this section, A will be an m x n matrix. To begin with, here is a simple lemma. 

Lemma 13.5.1 Let A be an m x n matrix. Then A* A is self adjoint and all its eigenvalues are 
nonnegative. 

Proof: It is obvious that A* A is self adjoint. Suppose A* Ax = Ax. Then A |x| = (Ax, x) = 
(A*Ax,x) = (Ax,Ax) > 0. ■ 

Definition 13.5.2 Let A be an m x n matrix. The singular values of A are the square roots of the 
positive eigenvalues of A* A. 

With this definition and lemma here is the main theorem on the singular value decomposition. 

Theorem 13.5.3 Let A be an m x n matrix. Then there exist unitary matrices, U and V of the 
appropriate size such that 

where a is of the form 

( (71 



a = 



V a k 



for the o~i the singular values of A. 



Proof: By the above lemma and Theorem 13.2.14 there exists an orthonormal basis, {v^}™ =1 
such that A*Avi = a^Vi where o\ > for i = 1, • • • , fc, (c^ > 0) and equals zero if z > k. Thus for 
i > fc, Avi = because 

(An, Av t ) = (A*Av,, v z ) = (0, vj = 0. 

For i — 1, • • • , fc, define u^ E F m by 

\ii = a^Avi. 

Thus Avi — aiUi. Now 

(u„u,) = (a^Av^a^Avj) = (o-r 1 v u aJ 1 A*Av j ) 



Thus {u^} i=1 is an orthonormal set of vectors in F m . Also, 

AA*m = AA*ar 1 Av i = <rr 1 AA*Av i = a^Ao-fa = a?m. 
Now extend {u^} i=1 to an orthonormal basis for all of F m , {u^}™ x and let 

U = (ui---u m ) 
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while V = (vi • • • v n ) . Thus U is the matrix which has the u^ as columns and V is defined as the 
matrix which has the v 7 as columns. Then 



/ u i \ 



U*AV 



ul 



^(vi---v n ) 






(criui • • a k u k O • • • 0) 



<T 


where a is given in the statement of the theorem. ■ 

The singular value decomposition has as an immediate corollary the following interesting result. 

Corollary 13.5.4 Let A be an m x n matrix. Then the rank of A and A* equals the number of 
singular values. 

Proof: Since V and U are unitary, it follows that 



rank (A) - rank(£TAV) 

a 

— number of singular values. 



rank 



Also since [/, V are unitary, 



rank (A*) = rank (V*A*/7) 

- rank (([/* AVf) 

a 


— number of singular values. 



rank 



13.6 Approximation In The Probenius Norm* 

The Frobenius norm is one of many norms for a matrix. It is arguably the most obvious of all norms. 
Here is its definition. 

Definition 13.6.1 Let A be a complex m x n matrix. Then 

\\A\\ F = (trace (AA*)) 1/2 
^4/so this norm comes from the inner product 

(A,B) F = trace (AB*) 
Thus \\A\\ F is easily seen to equal J2ij \ a ij\ so essentially, it treats the matrix as a vector in F mxn . 
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Lemma 13.6.2 Let A be an m x n complex matrix with singular matrix 



with a as defined above. Then 



a 




U\\] 



and the following hold for the Frobenius norm. If U, V are unitary and of the right size, 

\\UA\\ F = \\A\\ F , \\UAV\\ F = \\A\\ F . 



(13.13) 
(13.14) 



Proof: From the definition and letting [7, V be unitary and of the right size, 



Also, 



| \UA\ \ F = trace (UAA*U*) = trace (AA*) = \\A\\ F 



\\AV\\ 2 F = trace (AW* A*) = trace (AA*) = ||A" 2 



F ' 
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It follows 

\\UAV\\ 2 F = \\AV\\ 2 F 

Now consider 13.13. From what was just shown, 



\a\\ 2 f = \\uzv*\\ 2 f = 



IF ' 



Of course, this shows that 



\\A\\ 2 F = J2^l 



the sum of the squares of the singular values of A. 

Why is the singular value decomposition important? It implies 



A = U 



a 




y* 



where a is the diagonal matrix having the singular values down the diagonal. Now sometimes A is 
a huge matrix, 1000x2000 or something like that. This happens in applications to situations where 
the entries of A describe a picture. What also happens is that most of the singular values are very 
small. What if you deleted those which were very small, say for alii > I and got a new matrix, 



A' = U 



a' 




V*! 



Then the entries of A' would end up being close to the entries of A but there is much less information 
to keep track of. This turns out to be very useful. More precisely, letting 



o\ 







, U*AV 



a 




\A-A'\ 



U 



a — a 




V* 



E 

k=i+i 



°l 



Thus A is approximated by A! where A' has rank I < r. In fact, it is also true that out of all 
matrices of rank /, this A' is the one which is closest to A in the Frobenius norm. Here is why. 
Let B be a matrix which has rank I. Then from Lemma 13.6.2 



\A-B\ 



\U*(A-B)V\\ 2 F 



a 




U*BV 



U*AV 

2 



U*BV\ 



and since the singular values of A decrease from the upper left to the lower right, it follows that for 
B to be closest as possible to A in the Frobenius norm, 



U*BV = 







which implies B = A' above. This is really obvious if you look at a simple example. Say 



cr 




3 

-10 2 





for example. Then what rank 1 matrix would be closest to this one in the Frobenius norm? Obviously 

3 





Download free eBooks at bookboon.com 



353 



Elementary Linear Algebra Matrices And The Inner Product 



13.7 Moore Penrose Inverse* 

The singular value decomposition also has a very interesting connection to the problem of least 
squares solutions. Recall that it was desired to find x such that \Ax — y| is as small as possible. 
Lemma 13.3.3 shows that there is a solution to this problem which can be found by solving the system 
A* Ax = A*y. Each x which solves this system, solves the minimization problem as was shown in the 
lemma just mentioned. Now consider this equation for the solutions of the minimization problem in 
terms of the singular value decomposition. 

A* A A* 



Therefore, this yields the following upon using block multiplication and multiplying on the left by 

V*. 

1 o°)^ x = (o o)^ y ' (13 ' 15) 

One solution to this equation which is very easy to spot is 

x = y ( <J o 1 o) u * y - (13 ' 16) 

This special x is denoted by A + y. The matrix V ( n n I U* is denoted by A + . Thus x just 

defined is a solution to the least squares problem of finding the x such that Ax is as close as possible 
to y. Suppose now that z is some other solution to this least squares problem. Thus from the above, 

cr 2 \ * / a \ * 

r z= Uy 



and so, multiplying both sides by 



a~ 2 




l r xr U 



-1 



v*z= n " uy 



v**=[ ) try 



J \ 

To make V*z as small as possible, you would have only the first r entries of V*z be nonzero since 
the later ones will be zeroed out anyway so they are unnecessary. Hence 



and consequently 

z = y(V ° Q )iry = A+y 

However, minimizing \V*z\ is the same as minimizing |z| because V is unitary. Hence A + y is the 
solution to the least squares problem which has smallest norm. 

13.8 Exercises 

1. Here are some matrices. Label according to whether they are symmetric, skew symmetric, or 
orthogonal. If the matrix is orthogonal, determine whether it is proper or improper. 
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(a) 



(b) 



(c) 




1 

V2 



4 

7 



-3 

-4 





2. Show that every real matrix may be written as the sum of a skew symmetric and a symmetric 
matrix. Hint: If A is an n x n matrix, show that B = | [A — A T ) is skew symmetric. 

3. Let x be a vector in R n and consider the matrix I — ^ x * 2 . Show this matrix is both symmetric 
and orthogonal. 

4. For U an orthogonal matrix, explain why ||f7x|| = ||x|| for any vector x. Next explain why if 
U is an n x n matrix with the property that \\Ux.\\ = ||x|| for all vectors, x, then U must be 
orthogonal. Thus the orthogonal matrices are exactly those which preserve distance. 

5. A quadratic form in three variables is an expression of the form a\x 2 + a2y 2 + a^z 2 + a^xy + 
a$xz + a§yz. Show that every such quadratic form may be written as 

( x y z ) A 

where A is a symmetric matrix. 

6. Given a quadratic form in three variables, x, y, and z, show there exists an orthogonal matrix 
U and variables x' ^ y' ^ z' such that 





U 



x' 

y' 



with the property that in terms of the new variables, the quadratic form is 
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where the numbers, Ai, A2, and A3 are the eigenvalues of the matrix A in Problem 5. 

7. If A is a symmetric invertible matrix, is it always the case that A -1 must be symmetric also? 
How about A k for k a positive integer? Explain. 

8. If A, B are symmetric matrices, does it follow that AB is also symmetric? 

9. Suppose A, B are symmetric and AB = BA. Does it follow that AB is symmetric? 

10. Here are some matrices. What can you say about the eigenvalues of these matrices just by 
looking at them? 


(a) j -1 
1 



(b) 





www.im^rith-zf 



Download free eBooks at bookboon.com 



356 



Click on the ad to read more 



Elementary Linear Algebra 



Matrices And The Inner Product 




11. Find the eigenvalues and eigenvectors of the matrix 



c 














-b 





b 







( c 










a 




I o 


b 



Here 6, c are real numbers. 







12. Find the eigenvalues and eigenvectors of the matrix 

numbers. 

13. Find the eigenvalues and an orthonormal basis of eigenvectors for A. 



11 


-1 


-4 


-1 


11 


-4 


-4 


-4 


14 



Hint: Two eigenvalues are 12 and 18. 
14. Find the eigenvalues and an orthonormal basis of eigenvectors for A. 



Here a,b,c are real 



Hint: One eigenvalue is 3. 

15. Show that if A is a real symmetric matrix and A and \i are two different eigenvalues, then if x 
is an eigenvector for A and y is an eigenvector for /i, then x • y = 0. Also all eigenvalues are 
real. Supply reasons for each step in the following argument. First 

Ax x = (Ax.) x = x Ax. = x Ax = x Ax = Ax x 

and so A = A. This shows that all eigenvalues are real. It follows all the eigenvectors are real. 
Why? Now let x, y, fi and A be given as above. 



and so 



A (x • y) = Ax • y = Ax • y = x • Ay = x-fiy = /i (x • y) = /i (x • y) 
(A-/i)x-y = 0. 



Since A 7^ /i, it follows x • y = 0. 

16. Suppose U is an orthogonal nx n matrix. Explain why rank(Z7) = n. 

17. Show that if A is an Hermit ian matrix and A and \i are two different eigenvalues, then if x is 
an eigenvector for A and y is an eigenvector for /i, then x • y = 0. Also all eigenvalues are real. 
Supply reasons for each step in the following argument. First 



and so A 
above. 



Ax • x = Ax • x = x-Ax = X'Xx = Ax • x 
A. This shows that all eigenvalues are real. Now let x, y, \i and A be given as 

A (x • y) = Ax • y = Ax • y = x • Ay = x-fiy = /J (x • y) = /i (x • y) 
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and so 

(A-/i)x-y = 0. 

Since A 7^ /i, it follows x • y = 0. 

18. Show that the eigenvalues and eigenvectors of a real matrix occur in conjugate pairs. 

19. If a real matrix A has all real eigenvalues, does it follow that A must be symmetric. If so, 
explain why and if not, give an example to the contrary. 

20. Suppose A is a 3 x 3 symmetric matrix and you have found two eigenvectors which form an 
orthonormal set. Explain why their cross product is also an eigenvector. 

21. Study the definition of an orthonormal set of vectors. Write it from memory. 

22. Determine which of the following sets of vectors are orthonormal sets. Justify your answer. 

(a) {(1,1), (1,-1)} 

( c ) 1(1 2 2) (=1 =1 2\ (2 =2 l\j 
VW U3'3'3/'V3' 3'3/'V3' 3'3/J 

23. Show that if {ui, • • • , u n } is an orthonormal set of vectors in F n , then it is a basis. Hint: It 
was shown earlier that this is a linearly independent set. If you wish, replace F n with W 1 . Do 
this version if you do not know the dot product for vectors in C n . 

24. Fill in the missing entries to make the matrix orthogonal. 

/ -1 -1 j_ \ 

' 72 v/6 V3 ^ 



1 

V2 



\ 



3 



/ 



25. Fill in the missing entries to make the matrix orthogonal. 

2 

3 " 



26. Fill in the missing entries to make the matrix orthogonal. 



(\ 



2 



\ 



V 



^^5 



/ 



27. Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A by finding 
an orthogonal matrix U and a diagonal matrix D such that U T AU = D. 



Hint: One eigenvalue is -2. 
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28. Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A by finding 
an orthogonal matrix U and a diagonal matrix D such that U T AU = D. 



1 

7 


— ( 
17 


— ^ 
-4 


4 


-4 


14 



Hint: Two eigenvalues are 18 and 24. 

29. Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A by finding 
an orthogonal matrix U and a diagonal matrix D such that U T AU = D. 



A 



Hint: Two eigenvalues are 12 and 18. 

30. Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A by finding 
an orthogonal matrix U and a diagonal matrix D such that U T AU = D. 




( 



A = 



5 
3 


^VV5 


^ \ 


^VV5 


14 
5 


-^ 


f,^ 


-&V6 


7 
15 
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Hint: The eigenvalues are —3, —2, 1. 

31. Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A by finding 
an orthogonal matrix U and a diagonal matrix D such that U T AU = D. 

( 3 \ 
' 2 I \ 

U 2 2 



V 



1 3 
U 2 2 



/ 



32. Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A by finding 
an orthogonal matrix U and a diagonal matrix D such that U T AU = D. 



A = 




33. Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A by finding 
an orthogonal matrix U and a diagonal matrix D such that U T AU = D. 
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4 
3 

±V3V2 1 

|V2 -|V3 



r^V^ ±>/2 \ 



■|V5 

5 
3 



Hint: The eigenvalues are 0, 2, 2 where 2 is listed twice because it is a root of multiplicity 2. 

34. Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A by finding 
an orthogonal matrix U and a diagonal matrix D such that U T AU = D. 

( 



V 



1 \y/Zy/2 


IVsVq \ 


IV3V2 1 


^V2VQ 


IV3V& ^V2V& 


1 

2 



/ 



Hint: The eigenvalues are 2, 1,0. 
35. Find the eigenvalues and an orthonormal basis of eigenvectors for the matrix 



/ 



V 



;\/3\/2 



-^VsVq 



3 

2 



7 - 8 V3V&\ 



18 



12 



V2V6 



J 



Hint: The eigenvalues are 1, 2, —2. 
36. Find the eigenvalues and an orthonormal basis of eigenvectors for the matrix 



/ 



V 



1 

2 


-|x/6x/5 


h^ \ 


^VV5 


7 
5 


-\yft 


ToVS 


-*V6 


9 
10 



/ 



Hint: The eigenvalues are —1,2,-1 where —1 is listed twice because it has multiplicity 2 as 
a zero of the characteristic equation. 

37. Explain why a matrix A is symmetric if and only if there exists an orthogonal matrix U such 
that A = U T DU for D a diagonal matrix. 

38. The proof of Theorem 13.3.3 concluded with the following observation. If —ta + t 2 b > for 
all £ e R and b > 0, then a = 0. Why is this so? 

39. Using Schur's theorem, show that whenever A is an n x n matrix, det (A) equals the product 
of the eigenvalues of A. 

40. In the proof of Theorem 13.3.7 the following argument was used. If x • w = for all w E M n , 
then x = 0. Why is this so? 
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41. Using Corollary 13.3.8 show that a real mx n matrix is onto if and only if its transpose is one 
to one. 

42. Suppose A is a 3 x 2 matrix. Is it possible that A T is one to one? What does this say about 
A being onto? Prove your answer. 

43. Find the least squares solution to the following system. 



x + 2y = 


- 1 


2x + 3y = 


= 2 


3x + 5y - 


= 4 



44. You are doing experiments and have obtained the ordered pairs, 

(0,1) ,(1,2) ,(2, 3.5) ,(3, 4) 

Find m and b such that y = mx + b approximates these four points as well as possible. Now 
do the same thing for y = ax 2 + bx + c, finding a, 6, and c to give the best approximation. 

45. Suppose you have several ordered triples, (a^, y%^Zi) . Describe how to find a polynomial, 

z = a + bx + cy + d#2/ + ex 2 + fy 2 

for example giving the best fit to the given ordered triples. Is there any reason you have to 
use a polynomial? Would similar approaches work for other combinations of functions just as 
well? 

46. Find an orthonormal basis for the spans of the following sets of vectors. 

(a) (3, -4,0), (7, -1,0), (1,7,1). 

(b) (3,0, -4), (11, 0,2), (1,1, 7) 

(c) (3, 0,-4), (5, 0,10), (-7, 1,1) 

47. Using the Gram Schmidt process or the QR factorization, find an orthonormal basis for the 
span of the vectors, (1, 2, 1) , (2, —1, 3) , and (1, 0, 0) . 

48. Using the Gram Schmidt process or the QR factorization, find an orthonormal basis for the 
span of the vectors, (1, 2, 1, 0) , (2, —1, 3, 1) , and (1, 0, 0, 1) . 

49. The set, V = {(x,y,z) : 2x + 3y — z = 0} is a subspace of R 3 . Find an orthonormal basis for 
this subspace. 

50. The two level surfaces, 2x -\- 3y — z + w = and 3x — y + z + 2w = intersect in a subspace 
of R 4 , find a basis for this subspace. Next find an orthonormal basis for this subspace. 

51. Let A, B be a m x n matrices. Define an inner product on the set of m x n matrices by 

(A,B) F = trace (AB*) . 

Show this is an inner product satisfying all the inner product axioms. Recall for M an n x n 
matrix, trace (M) = Yl7=i ^u- The resulting norm, \\-\\ F is called the Frobenius norm and it 
can be used to measure the distance between two matrices. 

52. Let A be an m x n matrix. Show 

\\A\\ 2 F ^(A,A) F = J2^ 

3 

where the (jj are the singular values of A. 
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53. The trace of an n x n matrix M is defined as ^2 i Ma. In other words it is the sum of the entries 
on the main diagonal. If A, B are nx n matrices, show trace (AB) = trace (BA). Now explain 
why if A = S~ 1 BS it follows trace (A) = trace (B). Hint: For the first part, write these in 
terms of components of the matrices and it just falls out. 

54. Using Problem 53 and Schur's theorem, show that the trace of an n x n matrix equals the sum 
of the eigenvalues. 

55. If A is a general n x n matrix having possibly repeated eigenvalues, show there is a sequence 
{Ak} of n x n matrices having distinct eigenvalues which has the property that the ij th entry 
of Ak converges to the ij th entry of A for all ij. Hint: Use Schur's theorem. 

56. Prove the Cayley Hamilton theorem as follows. First suppose A has a basis of eigenvectors 
{vk}k=i •> Avk = AfcV/e. Let p (A) be the characteristic polynomial. Show p (A) v^ = p (A^) Vk = 
0. Then since {v^} is a basis, it follows p (A) x = for all x and so p(A) = 0. Next in the 
general case, use Problem 55 to obtain a sequence {Ak} of matrices whose entries converge 
to the entries of A such that Ak has n distinct eigenvalues and therefore by Theorem 12.1.13 
Ak has a basis of eigenvectors. Therefore, from the first part and for pk (A) the characteristic 
polynomial for A^, it follows pk (Ak) = 0. Now explain why and the sense in which 

lim Pk(Ak) =p(A). 
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57. Show that the Moore Penrose inverse A + satisfies the following conditions. 

AA + A = A, A + AA + = A + , A + A, AA + are Hermitian. 

Next show that if Aq satisfies the above conditions, then it must be the Moore Penrose inverse 
and that if A is an n x n invert ible matrix, then A~ x satisfies the above conditions. Thus 
the Moore Penrose inverse generalizes the usual notion of inverse but does not contradict it. 
Hint: Let 

and suppose 



where P is the same size as a. Now use the conditions to identify P = a, Q = etc. 
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58. Find the least squares solution to 




Next suppose e is so small that all e 2 terms are ignored by the computer but the terms of 
order e are not ignored. Show the least squares equations in this case reduce to 

3 3 + s \ ( x \ _( a + b + c 
3 + e 3 + 2e)\y)~\a + b+(l + e)c 

Find the solution to this and compare the y values of the two solutions. Show that one of these 
is —2 times the other. This illustrates a problem with the technique for finding least squares 
solutions presented as the solutions to A* Ax = A*y. One way of dealing with this problem 
is to use the QR factorization. This is illustrated in the next problem. It turns out that this 
helps alleviate some of the round off difficulties of the above. 

59. Show that the equations A* Ax = A*y can be written as R*Rx = R*Q*y where R is upper 
triangular and R* is lower triangular. Explain how to solve this system efficiently. Hint: You 
first find i£x and then you find x which will not be hard because R is upper triangular. 
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60. Show that A+ = (A*A) + A*. Hint: You might use the description of A+ in terms of the 
singular value decomposition. 
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Numerical Methods For Solving 
Linear Systems 

14.1 Iterative Methods For Linear Systems 

Consider the problem of solving the equation 

Ax = b (14.1) 

where A is an n x n matrix. In many applications, the matrix A is huge and composed mainly of 
zeros. For such matrices, the method of Gauss elimination (row operations) is not a good way to 
solve the system because the row operations can destroy the zeros and storing all those zeros takes 
a lot of room in a computer. These systems are called sparse. To solve them it is common to use 
an iterative technique. The idea is to obtain a sequence of approximate solutions which get close to 
the true solution after a sufficient number of iterations. 

Definition 14.1.1 Let {x^}^^ be a sequence of vectors in¥ n . Say 

x fc = [ x i, ' ' ' •> x n ) • 
Then this sequence is said to converge to the vector x = (#i, • • • , x n ) G F n , written as 



if for each j = 1,2, ••• , n, 



lim x/c = x 

/c— )-oo 



lim x, = Xj. 



In words, the sequence converges if the entries of the vectors in the sequence converge to the corre- 
sponding entries o/x. 

Example 14.1.2 Consider x^ = ( sin (1/k) , k fc2 ; In f 1 +j c j j . Find lim^oo x^. 

From the above definition, this limit is the vector (0, 1,0) because 

k 2 (l + k 2 \ 

lim sin(l//c) = 0, lim —7 = 1, and lim In ( — —z — ) = 0. 

k^oo ' k^oo 1 + k 2 k^oo \ k 2 J 

A more complete mathematical explanation is given in Linear Algebra. Linear Algebra 
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14.1.1 The Jacobi Method 

The first technique to be discussed here is the Jacobi method which is described in the following 
definition. In this technique, you have a sequence of vectors, {x^} which converge to the solution to 
the linear system of equations and to get the i th component of the x fe+1 , you use all the components 
of x. k except for the i th . The precise description follows. 

Definition 14.1.3 The Jacobi iterative technique, also called the method of simultaneous cor- 
rections, is defined as follows. Let x 1 be an initial vector, say the zero vector or some other vector. 
The method generates a succession of vectors, x 2 ,x 3 ,x 4 , • • • and hopefully this sequence of vectors 
will converge to the solution to 14-1- The vectors in this list are called iterates and they are obtained 
according to the following procedure. Letting A = (a^) , 



r r+l 



In terms of matrices, letting 



a ri x' ! ' - -^2 a ij xT j + bi- ( 14 - 2 ) 



ain 




Download free eBooks at bookboon.com 

368 



Elementary Linear Algebra 



Numerical Methods For Solving Linear Systems 



The iterates are defined as 



( an 


\ o 

021 

V a nl 




«22 



o \ 



/ 



ai2 




r+1 
x 2 



V < +1 / 

ai » \ ( x\\ 



Q>n—lr, 



/ 



\<J 



( h \ 

b 2 



\bn J 



(14.3) 



The matrix on the left in 14.3 is obtained by retaining the main diagonal of A and setting every 
other entry equal to zero. The matrix on the right in 14.3 is obtained from A by setting every 
diagonal entry equal to zero and retaining all the other entries unchanged. 
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Example 14.1 .4 Use the Jacobi method to solve the system 

/3 100\/si\ / 1 \ 

14 10 x 2 2 

2 5 1 x 3 ~ 3 

\ 2 4 / \ x 4 / V 4 / 

In terms of the matrices, the Jacobi iteration is of the form 



/ 3 

4 



\ 



0\ 





4/ 



/ x^ \ 



r r+l 
r.r+1 



\ < +1 1 



/o 
l 
o 





1 

0/ 



M\ 



v*w 



2 
3 

v 4 y 



Now iterate this starting with 



/ 3 \ / xf \ 

4 x\ 

5 x\ 

\0004y\xl/ 

Solving this system yields 



/o 

1 




/^\ 






v o y 

1 \ 

1 

2 1 
2 0/ 






v o y 



2 
3 

V ^ y 



/ .333 333 33 \ 



Then you use x 2 to find x 3 = ( x\ x 



3 
2 



.5 
.6 
1.0 



X 4 



/ 3 \ 

4 

5 

\ 4 J 



t x\\ 



\x\) 



^0100^ 
10 10 



The solution is 



V 

( x\\ 



2 
\ 

/ -5 

1.066 666 7 
1.0 

2.8 



1 
2 0/ 

\ 



/ 



/ .333 333 33 \ 
.5 
.6 
1.0 



V 



/ 



x 



\4 J 

Now use this as the new data to find x 4 = ( x\ 



( .166 666 67 \ 

.266 666 68 

.2 

.7 



V 



[3 \ ( x\ \ 

4 

5 
V0004 y /\ v x 4 y / 



1 

1 1 

2 

\ 2 

.733 333 32 

1.633 333 3 

1.766 666 6 

3.6 




1 

0/ 



( .166 66667 \ 

.266 666 68 

.2 

.7 



V 



/1.0\ 
2.0 
3.0 

V 4.0 ; 



2 
3 

V 4 y 



2 
3 

V 4 y 
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Thus you find 



x 4 = 



( .244444 44 \ 
.408 333 33 
.353 333 32 

V -9 



Then another iteration for x 5 gives 

( 3 Q\ ( x\\ 

4 

5 

\ 4 / \ ^ / 



/ 1 

1 1 

2 

\ 2 

/ .591666 67 \ 
1.402 222 2 
1.283 333 3 

\ 3.293 3334 / 



( .244 44444 \ 
.408 333 33 
.353 333 32 
.9 / 



2 
3 

V ^ y 



and so 



/ .197222 22 \ 
5 _ .350 555 55 
X " .256 666 66 * 
\ .823 333 35 / 

The solution to the system of equations obtained by row operations is 



( Xl \ 

x 2 

x 3 

\x 4 J 



( .206 \ 
.379 

.275 
V .862 ) 



so already after only five iterations the iterates are pretty close to the true solution. How well does 



it work? 

/ 3 1 \ / .19722222 \ 

14 10 .350 555 55 

2 5 1 .256 666 66 

\0 2 4 / \ .823 333 35 / 

A few more iterates will yield a better solution. 

14.1.2 The Gauss Seidel Method 



/ .942 222 21 \ 
1.8561111 

2.807 7778 
V 3.806 666 7 j 



2 
3 

V ^ y 



fe+i 



The Gauss Seidel method differs from the Jacobi method in using x^ for all j < i in going from 
x. k to x fc+1 . This is why it is called the method of successive corrections. The precise description of 
this method is in the following definition. 

Definition 14.1.5 The Gauss Seidel method, also called the method of successive corrections 

is given as follows. For A = (a^) , the iterates for the problem Ax = b are obtained according to the 
formula 



E 



QjijX ■ 



r+1 



^2 CLijX^ + bi. 



(14.4) 



j=i+l 



In terms of matrices, letting 



an 



O'ln 



A 



^nl ' ' ' ^n 
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The iterates are defined as 



( an 

V a nl '•• 

/ a 12 



Vo ••• 



\ 



^nn-l 



a>nn J 
Q>ln \ 



( x i +1 \ 

r+1 
x 2 



V < +1 / 



CL n —\ n 

o o / 



/ x\ \ 



\< J 



b 2 



(14.5) 



In words, you set every entry in the original matrix which is strictly above the main diagonal 
equal to zero to obtain the matrix on the left. To get the matrix on the right, you set every entry 
of A which is on or below the main diagonal equal to zero. Using the iteration procedure of 14.4 
directly, the Gauss Seidel method makes use of the very latest information which is available at that 
stage of the computation. 

The following example is the same as the example used to illustrate the Jacobi method. 

Example 14.1.6 Use the Gauss Seidel method to solve the system 



/3 1 

1 4 

2 

V 



\ / Xl \ 

x 2 

1 X-i 

4 J \ x 4 J 



In terms of matrices, this procedure is 



/ 3 \ 

14 

2 5 

\0 2 4/ 



/ x[^ \ 



l 2 
r.r+1 



\ < +1 / 



2 
3 

w 







1 

1 

1 

0/ 



\xl) 



2 

3 

V4/ 
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As before, let x 1 be the zero vector. Thus the first iteration is to solve 



/ 3 

1 4 

2 5 

\ 2 



Hence 



\ 



( x\\ 



\xl) 



/ 1 \ 
10 
1 

\ ) 




o o o o 


+ 


2 
3 




2 
3 

W 



/ .333 333 33 \ 
.416 666 67 
.433 333 33 
.783 333 33 



V 



/ 



t 
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Thus x 3 



(4 



is given by 



f 3 











1 


4 











2 


5 





\o 





2 


4 



f 4\ 



\4J 



( 1 \ / .333 333 33 ^ 
10 .416 666 67 

1 .433 333 33 

\ / \ .78333333 J 

/ .583 333 33 \ 



2 
3 

V 4 y 



1.566 666 7 

2.216 666 7 

4.0 



/ 



And so 



x 3 = 



/ .194444 44 \ 
.343 055 56 
.30611111 

\ .846 944 44 ) 



Another iteration for x 4 involves solving 



f 3 











1 


4 











2 


5 





Vo 





2 


4 



\ 



/*} \ 



v*w 



/ 1 

1 



V 





1 



/ .194444 44 \ 
.343 055 56 
.30611111 

\ .846 944 44 J 



2 
3 

V 4 y 



/ .656 94444 \ 

1.693 888 9 

2.153 055 6 

4.0 



/ 



and so 



.218 98148 
. 368 726 86 
.283120 38 
.858 439 81 



Recall the answer is 




so the iterates are already pretty close to the answer. You could continue doing these iterates and 
it appears they converge to the solution. Now consider the following example. 

Example 14.1.7 Use the Gauss Seidel method to solve the system 



/ 1 4 \ 




( Xl \ 




fl\ 


14 10 




x 2 




2 


2 5 1 




x 3 




3 


\ 2 4 J 




\x 4 J 




w 



The exact solution is given by doing row operations on the augmented matrix. When this is done 
the row echelon form is 

/ 1 6 \ 



10 


5 

4 


10 
1 


1 

1 

9 



Download free eBooks at bookboon.com 



374 



Elementary Linear Algebra 



Numerical Methods For Solving Linear Systems 



and so the solution is 



f 6 5 \ ( 6.0 \ 



V t J 



The Gauss Seidel iterations are of the form 



^10 0^ 

14 

2 5 

V 2 4 / 



r r+l 



r,r+l 



V -4 +1 7 



-1.25 
1.0 

.5 



/ 4 \ 

10 

1 

\0 0/ 



( x\\ 



\*l) 



2 
3 

W 



and so, multiplying by the inverse of the matrix on the left, the iteration reduces to the following in 
terms of matrix multiplication. 



4 
1 
2 



X 



r+1 



/o 


4 





\ 





-1 


1 
4 








2 

5 


1 
10 


1 

5 





1 
5 


1 
20 


1 
10 



V 



J 



This time, we will pick an initial vector close to the answer. Let 



i^ 



x 



V * ) 



This is very close to the answer. Now lets see what the Gauss Seidel iteration does to it. 



x 



( ° 


4 











-1 


1 

4 








2 

5 


1 
10 


1 

5 





1 
5 


1 
20 


1 

10 



V 



J 





( 6 ^ 




(') 




( 50 \ 




-1 




4 




-1.0 




1 


+ 


1 
2 


— 


.9 




I \ ) 




uJ 




I -55 J 



You can't expect to be real close after only one iteration. Lets do another. 





f° 


4 





\ 









-1 


1 

4 





/ 5.0 \ 


x 3 = - 





2 

5 


1 
10 


1 

5 


-1.0 
.9 






V 


1 
5 


1 
20 


~* ) 


V -55 ) 




^ 


4 





\ 







-1 


1 

4 





( 5 '° \ 


< 4 = - 





2 

5 


1 
10 


1 

5 


-.975 

.88 







1 
5 


1 
20 


1 
10 


V 56 ) 



4 
1 
2 



( 5 '° \ 

-.975 

.88 

V -56 I 



(W ( 4.9 \ 

-.945 
.866 

V ! J \ ' 567 / 
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The iterates seem to be getting farther from the actual solution. Why is the process which worked so 
well in the other examples not working here? A better question might be: Why does either process 
ever work at all? A complete answer to this question is given in more advanced linear algebra books. 



You can also see it in Linear Algebra. 
Both iterative procedures for solving 



Ax: 



(14.6) 



are of the form 



£x 



r+1 



-Cx r 



where A — B + C. In the Jacobi procedure, the matrix C was obtained by setting the diagonal of 
A equal to zero and leaving all other entries the same while the matrix B was obtained by making 
every entry of A equal to zero other than the diagonal entries which are left unchanged. In the 
Gauss Seidel procedure, the matrix B was obtained from A by making every entry strictly above 
the main diagonal equal to zero and leaving the others unchanged, and C was obtained from A by 
making every entry on or below the main diagonal equal to zero and leaving the others unchanged. 
Thus in the Jacobi procedure, B is a diagonal matrix while in the Gauss Seidel procedure, B is lower 
triangular. Using matrices to explicitly solve for the iterates, yields 



-B~ 



B~ l b. 



(14.7) 
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This is what you would never have the computer do, but this is what will allow the statement of a 
theorem which gives the condition for convergence of these and all other similar methods. 

Theorem 14.1.8 Let A = B + C and suppose all eigenvalues of B~ 1 C have absolute value less 
than 1 where A = B + C. Then the iterates in 14-7 converge to the unique solution of 14-6. 

A complete explanation of this important result is found in more advanced linear algebra books. 
You can also see it in Linear Algebra. It depends on a theorem of Gelfand which is completely 
proved in this reference. Theorem 14.1.8 is very remarkable because it gives an algebraic condition 
for convergence, which is essentially an analytical question. 

14.2 The Operator Norm* 

Recall that for x e C n , 



|x| = V(x,x) 
Also recall Theorem 3.2.17 which says that 

|z| > and |z| = if and only if z = (14.8) 

If a is a scalar, \az\ = \a\ |z| (14.9) 

|z + w| < |z| + |w|. (14.10) 

If you have the above axioms holding for ||-|| replacing |-| , then ||-|| is called a norm. For example, 
you can easily verify that 

||x|| = max{|xi| ,i = 1, • • • ,n : x = (x lr • • ,x n )} 

is a norm. However, there are many other norms. 

One important observation is that xi— ^||x|| is a continuous function. This follows from the obser- 
vation that from the triangle inequality, 

||x-y|| + ||y|| > ||x|| 

||x-y|| + ||x|| = ||y-x|| + ||x||>||y|| 



Hence 



and so 



W-||y|| < ||x-y| 
||y||-||x|| < ||x-y| 

|||x||-||y|||<||x-y|| 



This section will involve some analysis. If you want to talk about norms, this is inevitable. It 
will need some of the theorems of calculus which are usually neglected. In particular, it needs the 
following result which is a case of the Heine Borel theorem. To see this proved, see any good calculus 
book. 

Theorem 14.2.1 Let S denote the points x E ¥ n such that |x| = 1. Then if {x^}^^ is any sequence 
of points of S, there exists a subsequence which converges to a point of S. 

Definition 14.2.2 Let A be an m x n matrix. Let \\-\\ k denote a norm on C k . Then the operator 
norm is defined as follows. 

||A||=max{||Ax|| TO :||x||„<l} 
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Lemma 14.2.3 The operator norm is well defined and is in fact a norm on the vector space of 
m x n matrices. 

Proof: It has already been observed that the m x n matrices form a vector space starting on 
Page 99. Why is \\A\\ < oo? 

claim: There exists c > such that whenever ||x|| < 1, it follows that |x| < c. 

Proof of the claim: If not, then there exists {xfc} such that ||xfc|| < 1 but |x&| > k for 
k = 1, 2, • • • . Then |x^/ |xfc|| = 1 and so by the Heine Borel theorem from calculus, there exists a 
further subsequence, still denoted by k such that 



Xfc 

l X /c| 



-►0, |y| = l. 



Letting 



Xfc 

l x fc| 



It follows that a fc — >> a in F n . Hence 



Xfc 

l x fc| 



^afe*, y=^2diei 



<5Z|a* -ai|||e< 



which converges to 0. However, 



Xfc 



l X /c| 



and so, by continuity of ||-|| mentioned above, 



llyll = lim 

k— >-oo 



< 



Xfc 



l X fc| 



= 



Therefore, y = but also |y| = 1, a contradiction. This proves the claim. 
Now consider why \A\ < oo. Let c be as just described in the claim. 

sup{||Ax|| m : ||x||„ < 1} < sup{||Ax|| m : |x| < c] 

Consider for x, y with |x| , |y| < c 



\\Ax-AyW 



2_^Aij (xj Vjjei 



<^|A ii ||x i -y i |||e i ||<C|x-y| 



for some constant C. Soxh Ax is continuous. Since the norm ||-|| m is continuous also, it follows 
from the extreme value theorem of calculus that ||^4x|| achieves its maximum on the compact set 
{x : |x| < c} . Thus \\A\\ is well defined. The only other issue of significance is the triangle inequality. 
However, 

\\A + B\\ = m a x{||(A + J B)x|| m :||x||„<l} 

< max{Px|| m + ||Bx|| m :||x|| n <l} 

< max{Px|| m : ||x||„ < 1} + max{||Bx|| m : ||x||„ < 1} 

= PII + I|£|| 
Obviously \\A\\ = if and only if A = 0. The rule for scalar s is also immediate. ■ 
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The operator norm is one way to describe the magnitude of a matrix. Earlier the Frobenius 
norm was discussed. The Frobenius norm is actually not used as much as the operator norm. Recall 
that the Frobenius norm involved considering the m x n matrix as a vector in F mn and using the 
usual Euclidean norm. It can be shown that it really doesn't matter which norm you use in terms 
of estimates because they are all equivalent. This is discussed in Problem 25 below for those who 
have had a legitimate calculus course, not just the usual undergraduate offering. 



14.3 The Condition Number* 

Let A be an m x n matrix and consider the problem Ax = b where it is assumed there is a unique 
solution to this problem. How does the solution change if A is changed a little bit and if b is changed 
a little bit? This is clearly an interesting question because you often do not know A and b exactly. 
If a small change in these quantities results in a large change in the solution x, then it seems clear 
this would be undesirable. In what follows ||-|| when applied to a matrix will always refer to the 
operator norm. 

Lemma 14.3.1 Let A,B be m x n matrices. Then for ||-|| denoting the operator norm, 

\\AB\\<\\A\\\\B\\. 

Proof: This follows from the definition. Letting ||x|| < 1, it follows from the definition of the 
operator norm that 

IIABxIl < UAH 115x11 <|L4|||LB|| llxll <|L4|||LB|| 



and so 



|AB|| = sup ||ABx|| < ||j4||||B||. 

I|x||<1 



Lemma 14.3.2 Let A,B be m x n matrices such that A x exists, 
Then (A + B)~ exists 



suppose \\B\\ < 1/ \\A 1 ||. 



(A + By 1 



< \\A- 



\A~ 1 B\ 



The above formula makes sense because \\A 1 B\\<1. 
Proof: By Lemma 14.3.1, 

lu- 1 ^!! < |U- 1 ||||£|i < IL4- 1 ! 



|A-i| 



Suppose (A + B) x = 0. Then = A (l + A~ l B) x and so since A is one to one, (/ + A~ X B) x = 
0. Therefore, 

= ||(/ + A- 1 5)x|| > ||x|| - ll^-^xH 

> ||x|| - WA^BW ||x|| = (1 - llA- 1 ^!!) ||x|| > 

a contradiction. This also shows [l J r A~ 1 B") is one to one. Therefore, both (A + B)~ and 
(I + A^B)' 1 exist. Hence 



(A + By 1 = (A^ + A^B)) 1 = (I + A~ 1 B) X A' 1 



Now if 



(i + A-'BV'y 
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for ||y|| < 1, then 



(7 + A- 1 5)x = y 



and so 
and so 



(1- llA- 1 ^!!) < ||x + A _1 J Bx|| < ||y|| = 1 



(I + A-'B) 1 y 



< 



1 



l-WA^Bl 



Since ||y|| < 1 is arbitrary, this shows 



I + A-'B) 



< 



l-\\A-iB\ 



Therefore, 



(A + B)- 



(i + A-^y'A- 1 



< \A~ 



(I + A-'B)- 1 



< \\A- 



l-IIA- 1 ^! 
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Proposition 14.3.3 Suppose A is invertible, b /0, Ax = b, and A\X\ = bi where \\A — A\\\ < 
l/WA' 1 ]]. Then 

%A 7 „ * ^ l J|A|||u-M|f'^-- 4 " + ^MV (i4.il) 



(l-p-i(^i-A)H) V U 

Proof: It follows from the assumptions that 

Ax - Aix + Aix - Aixi = b - bi. 

Hence 

Ai (x - xi) = (Ai - A) x + b - bi. 

Now Ai = (A + (Ai — A)) and so by the above lemma, A^ 1 exists and so 

(x - xi ) = A^ 1 (A x - A) x + A' 1 (b - bi) 

= (A + (Ai - A))" 1 (Ax - A) x + (A + (Ax - A))" 1 (b - bi) 
By the estimate in Lemma 14.3.2, 

llx-xxii^ — Jj^L_ (|| Al _ A |||| x || + ||b- bl ||; 



Dividing by ||x|| , 



X " X1 "< „ If: 1 ' 1 ^ (WM-AW + ^pk) (14.12) 



||x|| " l - \\A-i (A x - A)\\ V" * " ||x 

Now b = Ax = A (A _1 b) and so ||b|| < ||A|| ||^ _1 b|| and so 

llxl^llA-^ll^llbll/IIAH. 
Therefore, from 14.12, 



|x-xi|| < 



/ IIAIIII^-AH IM^b-bij 



< 



l-WA-HAi-AW V ||A|| ||b| 

II^HPII fUi-A\\ , Hb-bil 
l-||A-i(^i-A)|| 



This shows that the number, ||A _1 || ||A||, controls how sensitive the relative change in the 
solution of Ax = b is to small changes in A and b. This number is called the condition number. It 
is bad when it is large because a small relative change in b, for example, could yield a large relative 
change in x. 

14.4 Exercises 

1. Solve the system 

2 6 / \ z J \ 3 

using the Gauss Seidel method and the Jacobi method. Check your answer by also solving it 
using row operations. 
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2. Solve the system 



4 


1 


1\ 


X 


1 


7 


2 


y 





2 


4 / 


\ z 




using the Gauss Seidel method and the Jacobi method. Check your answer by also solving it 
using row operations. 

3. Solve the system 




using the Gauss Seidel method and the Jacobi method. Check your answer by also solving it 
using row operations. 

4. Solve the system 





using the Gauss Seidel method and the Jacobi method. Check your answer by also solving it 
using row operations. 

5. Solve the system 



5 





1\ 


X 


1 


7 


1 


y 





2 


4 


\ z 




using the Gauss Seidel method and the Jacobi method. Check your answer by also solving it 
using row operations. 

6. Solve the system 




using the Gauss Seidel method and the Jacobi method. Check your answer by also solving it 
using row operations. 

7. If you are considering a system of the form Ax = b and A~ x does not exist, will either the 
Gauss Seidel or Jacobi methods work? Explain. What does this indicate about using either of 
these methods for finding eigenvectors for a given eigenvalue? 

8. Verify that 

llxH^ =max{|a?i| ,i = 1, • • • ,n : x = Oi,--- ,x n )} 

is a norm. Next verify that 

n 

ll X lll = ^2\ X i\i X= 0l>'-- ,X n ) 
i=l 

is also a norm on F n . 

9. Let A be an n x n matrix. Denote by ||-A|| 2 the operator norm taken with respect to the usual 
norm on F n . Show that 

pii 2 = ^i 

where <j\ is the largest singular value. Next explain why ||A _1 || = l/<r n where a n is the 
smallest singular value of A. Explain why the condition number reduces to <7i/<r n if the 



u 3\ 



( n 2N 1/2 

operator norm is defined in terms of the usual norm, |x| = \Y™ =1 \x^ 
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10. Let p, q > 1 and 1/p + 1/g = 1. Consider the following picture. 




Using elementary calculus, verify that for a, 6 > 0, 

a p b q 



V Q 
11. ^For p > 1, the p norm on F n is defined by 



> aft. 



-Ei 



1/p 



lx|| p = \ ? A x k\ 

\/c=l 



S 



taflord 

associates 






liAtt ') 



Find your next education here! 




Click here 



o 




bookboon.com/blog/subsites/stafford 
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In fact, this is a norm and this will be shown in this and the next problem. Using the above 
problem in the context stated there where p, q > 1 and 1/p+l/q = 1, verify Holder's inequality 



5>*IW<l|x|| p iiy||, 



k=l 

Hint: You ought to consider the following. 



v^ \xk\ ml 



S n x iip iiyii* 

Now use the result of the above problem. 

12. "f Now for p > 1, verify that ||x + y || < ||x|| + ||y || . Then verify the other axioms of a norm. 
This will give an infinite collection of norms for F n . Hint: You might do the following. 

Ilx + yl 



< 


^2\xk + Vk\ P ~ 
fc=l 


_1 0fc 


\ + \Vk\) 


= 


n 

^2\xk + Vk\ P ~ 


_1 \x k \ 


n 



Vk\ P 1 \yk\ 

k=l k=l 

Now explain why p — 1 = p/q and use the Holder inequality. 

13. This problem will reveal the best kept secret in undergraduate mathematics, the definition of 
the derivative of a function of n variables. Let ||-|| be a norm on F n and also denote by ||-|| a 
norm on F m . If you like, just use the standard norm on both F n and F m . It can be shown that 
this doesn't matter at all (See Problem 25 on 474.) but to avoid possible confusion, you can 
be specific about the norm. A set U C F n is said to be open if for every x G U, there exists 
some r x > such that B (x, r x ) C U where 

£(x,r) = {yGF n :||y-x||<r} 

This just says that if U contains a point x then it contains all the other points sufficiently near 
to x. Let f : U i— >• F m be a function defined on U having values in F m . Then f is differentiate 
at x G U means that there exists anmxn matrix A such that for every s > 0, there exists a 
S > such that whenever < |v| < <5, it follows that 

||f(x + v)-f(x)-Av|| _ 



Stated more simply, 



Hm ||f(x + v)-f(x)-Av|| 



llvIKo ||v|| 

Show that A is unique and verify that the i th column of A is 

at 

a^ (x) 

so in particular, all partial derivatives exist. This unique m x n matrix is called the derivative 
of f . It is written as Df (x) = A. 
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Numerical Methods For Solving 
The Eigenvalue Problem 

15.1 The Power Method For Eigenvalues 

This chapter presents some simple ways to find eigenvalues and eigenvectors. It is only an intro- 
duction to this important subject. However, I hope to convey some of the ideas which are used. 
As indicated earlier, the eigenvalue eigenvector problem is extremely difficult. Consider for example 
what happens if you find an eigenvalue approximately. Then you can't find an approximate eigen- 
vector by the straight forward approach because A — XI is invertible whenever A is not exactly equal 
to an eigenvalue. 

The power method allows you to approximate the largest eigenvalue and also the eigenvector 
which goes with it. By considering the inverse of the matrix, you can also find the smallest eigenvalue. 
The method works in the situation of a nondefective matrix A which has a real eigenvalue of algebraic 
multiplicity 1, A n which has the property that |A^| < |A n | for all k ^ n. Such an eigenvalue is called 
a dominant eigenvalue. 

Let {xi, • • • ,x n } be a basis of eigenvectors for F n such that Ax n = A n x n . Now let ui be some 
nonzero vector. Since {xi, • • • ,x n } is a basis, there exists unique scalars, q such that 

n 
Ul = ^CfcXfe. 

k=l 

Assume you have not been so unlucky as to pick ui in such a way that c n = 0. Then let Auk = u/c + i 
so that 

n-l 

u m = A m Ul = Y, CfeA^Xfc + A™c n x n . (15.1) 

fc = l 

For large m the last term, A™c n x n , determines quite well the direction of the vector on the right. 
This is because |A n | is larger than |A^| for k < n and so for a large m, the sum, J2k=i c k^k^ki on 
the right is fairly insignificant. Therefore, for large m, u m is essentially a multiple of the eigenvector 
x n , the one which goes with A n . The only problem is that there is no control of the size of the 
vectors u m . You can fix this by scaling. Let S 2 denote the entry of Aui which is largest in absolute 
value. We call this a scaling factor. Then 112 will not be just Aui but A\i\/S2- Next let S3 denote 
the entry of Au 2 which has largest absolute value and define 113 = Au 2 /Ss. Continue this way. The 
scaling just described does not destroy the relative insignificance of the term involving a sum in 
15.1. Indeed it amounts to nothing more than changing the units of length. Also note that from this 
scaling procedure, the absolute value of the largest element of u^ is always equal to 1. Therefore, 
for large m, 

u m = g n n Q h (relatively insignificant term) . 

62O3 • • • bm 
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Therefore, the entry of Au m which has the largest absolute value is essentially equal to the entry 
having largest absolute value of 

/\ n C n JS- n ^ /\ n L, n J^ n ^ 



S2S3 • • • S m J S2S3 • - • S n 

and so for large m, it must be the case that A n ~ 5Vn+i- This suggests the following procedure. 
Finding the largest eigenvalue with its eigenvector. 



1. Start with a vector u x which you hope has a component in the direction of x n . The vector 
(1, • • • ,1) is m 

2. If u/c is known, 



(1, • • • , 1) is usually a pretty good choice. 



Au k 

u /c+l 



5 


-14 


11 


-4 


4 


-4 


3 


6 


-3 



Sk+l 

where Sk+i is the entry of Au^ which has largest absolute value. 

3. When the scaling factors, Sk are not changing much, Sk+i will be close to the eigenvalue and 
Ufc+i will be close to an eigenvector. 

4. Check your answer to see if it worked well. 

In finding an initial vector, it is clear that if you start with a vector which isn't too far from an 
eigenvector, the process will work faster. Also, the computer is able to raise the matrix to a power 
quite easily. You might start with A p x for large p. As explained above, this will point in roughly 
the right direction. Then normalize it by dividing by the largest entry and use the resulting vector 
as your initial approximation. This ought to be close to an eigenvector and so the process would be 
expected to converge rapidly for this as an initial choice. 

Example 15.1.1 Find the largest eigenvalue of A — 

► 

I will use the above suggestion. 

1.0271 x 10 16 

-5. 135 7 x 10 15 

4.7018 x 10 11 

Now divide by the largest entry to get the initial approximation for an eigenvector 

1.0271 x 10 16 \ - / 1.0 

-5. 135 7 x 10 15 — H = -0.500 02 j - u, 

4.7018xlO n y L0271x10 \ 4.5777 xlO- 5 

The power method will now be applied to find the largest eigenvalue for the above matrix 
beginning with this vector. 

1.0 \ / 12.001 

-0.500 02 = -6.000 3 

4. 577 7 x 10- 5 / \ -2. 573 3 x 10~ 4 

Scaling this vector by dividing by the largest entry gives 

12.001 \ l / L0 

- 6 ' 0003 T^T= -0.499 98 | = u 2 

-2.573 3 x 10- 4 / uu \ -2. 144 2xl0- 5 
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Now lets do it again. 



-14 11 
4 -4 
6 -3 



1.0 

-0.499 98 
-2.144 2 x 10~ 5 



11.999 

-5.999 8 

1.843 3 x 10~ 4 



The new scaling factor is very close to the one just encountered. Therefore, it seems this is a 
good place to stop. The eigenvalue is approximately 11.999 and the eigenvector is close to the one 
obtained above. How well does it work? With the above equation, consider 



11.999 



1.0 

-0.499 98 

-2.144 2 x 10~ 5 



11.999 

-5.999 3 

-2.572 8 x 10~ 4 



These are clearly very close so this is a good approximation. In fact, the exact eigenvalue is 12 and 
an eigenvector is 

1.0 

-0.5 
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15.2 The Shifted Inverse Power Method 

This method can find various eigenvalues and eigenvectors. It is a significant generalization of the 
above simple procedure and yields very good results. The situation is this: You have a number a 
which is close to A, some eigenvalue of an n x n matrix A. You don't know A but you know that 
a is closer to A than to any other eigenvalue. Your problem is to find both A and an eigenvector 
which goes with A. Another way to look at this is to start with a and seek the eigenvalue A, which 
is closest to a along with an eigenvector associated with A. If a is an eigenvalue of A, then you have 
what you want. Therefore, we will always assume a is not an eigenvalue of A and so (A — al)~ 
exists. When using this method it is nice to choose a fairly close to an eigenvalue. Otherwise, the 
method will converge slowly. In order to get some idea where to start, you could use Gerschgorin's 
theorem to get a rough idea where to look. The method is based on the following lemma. 

Lemma 15.2.1 Let {Xk}^ =1 be the eigenvalues of A, a not an eigenvalue. Then x& is an eigenvector 
of A for the eigenvalue A&, if and only if x/c is an eigenvector for (A — al)~ corresponding to the 
eigenvalue x 1 _ a . 

Proof: Let Xk and x^ be as described in the statement of the lemma. Then 

(A - al) x fe = (A fe - a) x fe 

if and only if 

Xfc = (A - al)' 1 Xfc. ■ 

Xk - a 

In explaining why the method works, we will assume A is nondefective. This is not necessary! 
One can use Gelfand's theorem on the spectral radius which is presented in [11] and invariance of 
(A — al)~ on generalized eigenspaces to prove more general results. It suffices to assume that the 
eigenspace for Xk has dimension equal to the multiplicity of the eigenvalue Xk but even this is not 
necessary to obtain convergence of the method. This method is better than might be supposed from 
the following explanation. 

Pick ui, an initial vector and let ix^ = A^x^, where {xi,--- ,x n } is a basis of eigenvectors 
which exists from the assumption that A is nondefective. Assume a is closer to A n than to any other 
eigenvalue. Since A is nondefective, there exist constants, ctk such that 

n 

ui = y^q/cx fc . 
fc=l 
Possibly A n is a repeated eigenvalue. Then combining the terms in the sum which involve eigenvectors 
for A n , a simpler description of ui is 

m 
3 = 1 

where y is an eigenvector for A n which is assumed not equal to 0. (If you are unlucky in your choice 
for ui, this might not happen and things won't work.) Now the iteration procedure is defined as 

_ (A-aiy 1 ^ 

U /C+1 = o 

^fc + 1 

where Sk+i is the element of (A — al)~ u& which has largest absolute value. From Lemma 15.2.1, 

k / \k 



U/c+l 



E^i%(x^) x ^ + (a^) y 



(*b) i 



$2 • • • Sfc+1 
k 



S 2 -" S k +i 1 ^ J V Aj 



A - x k 



Xo 
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Now it is being assumed that A n is the eigenvalue which is closest to a and so for large fc, the term, 

m , . _ x k 



i=i 



x j 



is very small, while for every k > 1, u& is a moderate sized vector because every entry has absolute 
value less than or equal to 1. Thus 

Ufc+i = } Xn Z J (E fc + y) = C fc (E fc + y) 

where E& —> 0, y is some eigenvector for A n , and C k is of moderate size, remaining bounded as 
/c — » 00. Therefore, for large fc, 

Ufe+i - C^y = CfcEfe^ 

and multiplying by (A — a/) - yields 

(i-^UHi-fA-al)- 1 ^ = (A-aJTV+i-Cfcf— !— )y 

VA n -a/ 

« (A - a/) -1 u fc+ i - ( ) ujfe + i« 0. 

Therefore, for large fc, u^ is approximately equal to an eigenvector of (A — al)~ . Therefore, 

(A - aiy 1 u k « u fe 

\ n - a 

and so you could take the dot product of both sides with u^ and approximate A n by solving the 
following for A n . 

(A - aiy 1 u k ■ u k 1 



Ufc 



2 A n -a 



How else can you find the eigenvalue from this? Suppose u^ = (wi,-— ,w n ) and from the 
construction \wi\ < 1 and w k = 1 for some k. Then 

Sfc+iUfc+i = (A - aiy 1 Ufc « (A - a/) -1 (C^-iy) =- (Cfc-iy) ~ t u fe . 

A n - a A n - a 

Hence the entry of (A — al)~ u k which has largest absolute value is approximately A 1 _ a and so it 
is likely that you can estimate A n using the formula 

1 

Sk+l 



An - a 



Of course this would fail if (A — al)~ u k had consistently more than one entry having equal absolute 
value, but this is unlikely. 

Here is how you use the shifted inverse power method to find the eigenvalue and 
eigenvector closest to a. 

1. Find (A -aiy 1 . 

2. Pick ui. It is important that Ui = Y^JLi a j*j + y wnere Y i s an eigenvector which goes with 
the eigenvalue closest to a and the sum is in an "invariant subspace corresponding to the other 
eigenvalues" . Of course you have no way of knowing whether this is so but it typically is so. If 
things don't work out, just start with a different ui. You were phenomenally unlucky in your 
choice. 
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3. If Uk has been obtained, 

U /c+l - c 

where Sk+i is the element of (A — al)~ u^ which has largest absolute value. 

4. When the scaling factors, Sk+i are not changing much and the u^ are not changing much, find 
the approximation to the eigenvalue by solving 

Sfc+i = t 

A — a 

for A. The eigenvector is approximated by Ufc+i. 

5. Check your work by multiplying by the original matrix to see how well what you have found 
works. 

Also note that this is just the power method applied to (A — XI)~ . The eigenvalue you want is 
the one which makes j^ as large as possible for all A G a (A) . This is because making A — a small 

is the same as making (A — a)~ large. 

/ 5 -14 11 \ 

Example 15.2.2 Find the eigenvalue of A — — 4 4 — 4 which is closest to —7. Also 

\ 3 6 -3 / 

find an eigenvector which goes with this eigenvalue. 

In this case the eigenvalues are —6, 0, and 12 so the correct answer is —6 for the eigenvalue. 

-l 




0.51128 0.91729 -0.488 72 

3.0075 x HT 2 0.11278 3.0075 x 10~ 2 
-0.428 57 -0.85714 0.57143 

To get a good initial vector, follow the shortcut described earlier which works for the power method. 

0.51128 0.91729 -0.488 72 \ 23 / 1 

3.0075xl0- 2 0.11278 3.0075 x 10~ 2 1 

-0.428 57 -0.85714 0.57143 / \ 1 

0.999 99 

4.8713 x 10- 20 

-0.999 99 
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That middle term looks a lot like so for an initial guess I will simply take it equal to 0. Also the 
other term is very close to 1. Therefore, I will take it to equal 1. Then 



0.51128 0.91729 -0.488 72 \ / 1 

3.0075 xl(T 2 0.11278 3.0075 x 10" 2 || 
-0.428 57 -0.85714 0.57143 

1.0 


-1.0 

It looks like we have found an eigenvector for (A + 71) _ . Then to find the eigenvalue desired, solve 

1 =1 



A + 7 
This yields A = — 6 which is actually the exact eigenvalue closest to —7. 

1 2 3 
Example 15.2.3 Consider the symmetric matrix A = ( 2 1 4 ) . Find the middle eigenvalue 

3 4 2 
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and an eigenvector which goes with it. 

Since A is symmetric, it follows it has three real eigenvalues which are solutions to 

/ (1 

p (A) = det A 1 
V \0 1 

= A 3 - 4A 2 - 24A - 17 = 

If you use your graphing calculator to graph this polynomial, you find there is an eigenvalue some- 
where between —.9 and —.8 and that this is the middle eigenvalue. Of course you could zoom in 
and find it very accurately without much trouble but what about the eigenvector which goes with 
it? If you try to solve 







there will be only the zero solution because the matrix on the left will be invertible and the same 
will be true if you replace —.8 with a better approximation like —.86 or —.855. This is because all 
these are only approximations to the eigenvalue and so the matrix in the above is nonsingular for 
all of these. Therefore, you will only get the zero solution and 



Eigenvectors are never equal to zero ! 



However, there exists such an eigenvector and you can find it using the shifted inverse power method. 
Pick a = -.855. Then 



.855 





-367. 5 


215.96 


83.601 


215.96 


-127.17 


-48. 753 


83.601 


-48. 753 


-19.191 




Next use the power method for this matrix. 

-367.5 215.96 83.601 
215.96 -127.17 -48.753 
83.601 -48.753 -19.191 

-2.282 2 x 10 34 
1.3415 x 10 34 
5. 183 7 x 10 33 

Now divide by the largest entry to get a good first approximation to the eigenvector. 

-2.282 2 x 10 34 \ / 1.0 

1.3415 x 10 34 rj = -0.587 81 

5. 183 7 x 10 33 / ~ 2 ' 282 2 X 10 V -°« 227 14 

-367.5 215.96 83.601 \ / 1.0 \ / -513.43 
215.96 -127.17 -48.753 -0.58781 = 301.79 

83.601 -48.753 -19.191 / \ -0.22714 / \ 116.62 

Now divide by the largest entry to get the next approximation 

-513.43 \ - / 1.0 

301,79 r^~^ = -°- 58779 

116.62 / ~ 513 - 43 \ -0.22714 
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Clearly this vector has not changed much and so the next scaling factor will be very close to the one 
just considered. Hence the eigenvalue will be determined by solving 

-513.43 



A + .855 
It follows the approximate eigenvalue is about 

A = -0.856 95 

The approximate eigenvector is 

1.0 
-0.58781 
-0.22714 



How well does it work? 




1.0 \ / -0.85704 
-0.58781 = 0.503 63 
-0.22714 / \ 0.19448 



1.0 \ / -0.856 95 

0.856 95 [ -0.58781 = 0.503 72 
-0.22714 / \ 0.19465 



This is pretty close. If you wanted to get closer, you could simply do more iterations. For practical 
purposes, the eigenvalue and eigenvector have been obtained. 

Example 15.2.4 Find the eigenvalues and eigenvectors of the matrix 

A = 





This is only a 3x3 matrix and so it is not hard to estimate the eigenvalues. Just get the 
characteristic equation, graph it using a calculator and zoom in to find the eigenvalues. If you do 
this, you find there is an eigenvalue near —1.2, one near —.4, and one near 5.5. (The characteristic 
equation is 2 + 8 A + 4 A — A = 0.) Of course we have no idea what the eigenvectors are. 

Lets first try to find the eigenvector and a better approximation for the eigenvalue near —1.2. In 
this case, let a = —1.2. Then 

/ -25.357143 -33.928 571 
(A-aiy 1 = 12.5 17.5 

\ 23.214 286 30.357143 

Then 

-25.357143 -33.928 571 

12.5 17.5 

23.214 286 30.357143 

-4.943 2 x 10 28 
2.4312 x 10 28 
4.492 8 x 10 28 

The initial approximation for an eigenvector will then be the above divided by its largest entry. 

-4.943 2 x 10 28 \ - / 1.0 

2.4312 x 10 28 —r^^ — tt^ = -0.49183 
4.492 8X10 28 / -4-943 2 xlO 28 1 _ a90888 
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Next 

-25.357143 -33.928 571 50.0 \ / 1.0 \ / -54.114 

12.5 17.5 -25.0 -0.49183 = 26.615 

23.214 286 30.357143 -45.0 / \ -0.908 88 / \ 49.183 




Normalizing this, the next approximation is 

-54.114 \ / 1.0 

26.615 r^-pfZ = -0.49183 
49.183 / - 54 - 114 y -0.908 88 

which is essentially the same vector. Hence this vector is the approximate eigenvector and to find 
the eigenvalue you solve 

— — — = -54.114 
A + 1.2 

Thus the eigenvalue isA=— 1.2185. How well does it work? 

1.0 \ / -1.2185 
-0.49183 = 0.599 29 
-0.908 88 / \ 1.1075 

while 

1.0 \ / -1.2185 

1,218 5 | -0.49183 = 0.599 29 

-0.908 88 / \ 1.107 5 

For practical purposes, this has found the eigenvalue near —1.2 as well as an eigenvector associated 
with it. 

Next we shall find the eigenvector and a more precise value for the eigenvalue near —.4. In this 
case, 

/ 8. 064 516 1 x 10~ 2 -9. 274 193 5 6. 451 612 9 
(A-aI)~ 1 ={ -.403 225 81 11.370 968 -7.258 064 5 

\ .403 225 81 3.629 032 3 -2.741935 5 

The first approximation to an eigenvector can be obtained as before. 

8. 064 516 1 x 10~ 2 -9. 274 193 5 6. 451 612 9 
-.403 225 81 11.370 968 -7.258 064 5 

.403 225 81 3.629 032 3 -2.741935 5 

-1.853 5 x 10 16 
2.372 4 x 10 16 
6.2874 x 10 15 

The first choice for an approximate eigenvector is 

-1.853 5 x 10 16 \ - / -0.78128 

2.372 4 x 10 16 — -—g = 1.0 

6.2874 x 10 15 / 2 - 3724xl ° \ 0.26502 
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Now 



3.064 5161 x 10- 

-.403 225 81 

.403 225 81 



-9.274193 5 
11.370 968 
3.629 032 3 



6.451612 9 
-7.258 064 5 
-2.741935 5 



-0.78128 

1.0 
0.265 02 



Hence the next approximate eigenvector is 

-7.6274 
9.762 5 
2.5873 



-7.6274 
9.762 5 
2.5873 



1 



9.762 5 



-0.78130 

1.0 
0.265 02 



This is pretty close to the same vector and so the next scaling factor will be essentially the same 
as the one just used. Thus the above vector is a good approximate eigenvector and an approximate 
eigenvalue is obtained by solving 

— !— = 9. 762 5 
A + .4 
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Hence A = —0.29757. How well does it work? 

-0.78130 \ / 0.232 46 
1.0 = -0.297 58 

0.265 02 / \ -0.078 88 

-0.78130 \ / 0.232 49 

■0,29757 ! 1.0 = -0.29757 

0.265 02 / \ -7.886 2 x 10~ 2 

It works pretty well. For practical purposes, the eigenvalue and eigenvector have now been found. 
If you want better accuracy, you could just continue iterating. 

Next we will find the eigenvalue and eigenvector for the eigenvalue near 5.5. In this case, 

(A-aiy 1 = 

As before, I have no idea what the eigenvector is but to avoid giving the impression that you always 
need to start with the vector (1,1,1) , let ui = (1, 2, 3) . What follows is the iteration without all 
the comments between steps. 

29.2 16.8 23.2 \ / 1 \ / 1.324 x 10 2 
19.2 10.8 15.2 2 = 86.4 



Si = 86. 4. 



29.2 


16.8 


23.2 


19.2 


10.8 


15.2 


28.0 


16.0 


22.0 



28.0 16.0 22.0 / \ 3 / \ 1.26 x 10 2 



S 2 =95.379 629. 



1.532 4074 
u 2 - [ 1.0 

1.458 333 3 

29.2 16.8 23.2 \ / 1.532 4074 \ / 95.379 629 
19.2 10.8 15.2 1.0 = 62.388 888 

28.0 16.0 22.0 / \ 1.458 333 3 / \ 90.990 74 

1.0 
u 3 - | .65411125 
.953 985 05 

29.2 16.8 23.2 \ / 1.0 \ / 62.321522 

19.2 10.8 15.2 .65411125 = 40.764 974 

28.0 16.0 22.0 / \ .953 985 05 / \ 59.453451 

1.0 
u, - | .65410748 
.953 97945 

29.2 16.8 23.2 \ / 1.0 \ / 62.321329 

19.2 10.8 15.2 .65410748 = 40.764 848 

28.0 16.0 22.0 / \ .953 97945 / \ 59.453 268 

S4 = 62. 321 329. Looks like it is time to stop because this scaling factor is not changing much from 

S3. 

1.0 
u 5 - I .65410749 
.953 97946 



62.321522. 
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Then the approximation of the eigenvalue is gotten by solving 

62.321329 




A -5.5 
which gives A = 5. 516 045 9. Lets see how well it works. 

1.0 \ / 5.516 045 9 

.65410749 = 3.608 087 
.953 97946 / \ 5.2621944 

1.0 \ / 5.516 045 9 

5.5160459 1 .65410749 = 3.6080869 

.953 97946 / \ 5.262194 5 

15.2.1 Complex Eigenvalues 

What about complex eigenvalues? If your matrix is real, you won't see these by graphing the char- 
acteristic equation on your calculator. Will the shifted inverse power method find these eigenvalues 
and their associated eigenvectors? The answer is yes. However, for a real matrix, you must pick a to 
be complex. This is because the eigenvalues occur in conjugate pairs, so if you don't pick it complex, 
it will be the same distance between any conjugate pair of complex numbers and so nothing in the 
above argument for convergence implies you will get convergence to a complex number. Also, the 
process of iteration will yield only real vectors and scalars. 

Example 15.2.5 Find the complex eigenvalues and corresponding eigenvectors for the matrix 




Here the characteristic equation is A — 5A + 8A — 6 = 0. One solution is A = 3. The other two 
are 1 + i and 1 — i. Apply the process to a = i so we will find the eigenvalue closest to i. 

/ -.0 2- .Ui 1.24 
(A-aiy 1 = -.14 +.02i 

\ .0 2 + .14z -.24 

I shall use the trick of raising to a high power to begin with. 





-0.4 + 0.8* 
0.2 + 0.6z 
0.4 + 0.2* 

Now normalize by dividing by the largest entry to get a good initial vector. 

-0.4 + 0.8* 



0.2 + 0.6* 
0.4 + 0.2* 



1 



-0.4 + 0.8* 




Then 
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It looks like this has found the eigenvector exactly. Then to get the eigenvalue, you need to solve 

1 =1 



A 



Thus A = 1 + i. How well does the above vector work? 

5 -8 6 \ / 1.0 

10 0.5-0.5* 

1 / \ -0.5z 

1.0 
(1 + i) ( 0.5-0.5* j = 
-0.5z 



1.0 + 1.0* 

1.0 
0.5-0.5* 



1.0 + 1.0* 
1.0 

0.5-0.5* 



It appears that this has found the exact answer. 

This illustrates an interesting topic which leads to many related topics. If you have a polynomial, 
x 4 + ax 3 + bx 2 + ex + d, you can consider it as the characteristic polynomial of a certain matrix, 
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called a companion matrix. In this case, 

/ —a —b —c —d \ 

10 

10 

\ 1 / 

The above example was just a companion matrix for A — 5 A + 8A — 6. You can see the pattern 
which will enable you to obtain a companion matrix for any polynomial of the form A n + a\\ n ~ + 
• • • + a n -\\ + a n . This illustrates that one way to find the complex zeros of a polynomial is to use 
the shifted inverse power method on a companion matrix for the polynomial. Doubtless there are 
better ways but this does illustrate how impressive this procedure is. Do you have a better way? 

15.3 The Rayleigh Quotient 

There are many specialized results concerning the eigenvalues and eigenvectors for Hermitian ma- 
trices. A matrix A is Hermitian if A = A* where A* means to take the transpose of the conjugate 
of A. In the case of a real matrix, Hermitian reduces to symmetric. Recall also that for x E F n , 



|x| 2 = x*x = ^\xj\ 2 . 






The following corollary gives the theoretical foundation for the spectral theory of Hermitian 
matrices. This is a corollary of a theorem which is proved Corollary 13.2.14 and Theorem 13.2.14 
on Page 340. 

Corollary 15.3.1 If A is Hermitian, then all the eigenvalues of A are real and there exists an 
orthonormal basis of eigenvectors. 



Thus for {xfc}^ =1 this orthonormal basis, 

X," X/J Oiq = \ p. 



if i = j 
iAj UtJ ~^ Oifz^j 



For x G F n , x / 0, the Rayleigh quotient is defined by 

x*ix 



I |2 ' 
X 



Now let the eigenvalues of A be Ai < A2 < • • • < A n and Ax/e = A^x^ where {xfc}£ =1 is the above 
orthonormal basis of eigenvectors mentioned in the corollary. Then if x is an arbitrary vector, there 
exist constants, a^ such that 

n 

x =y^a i x i . 

i=l 

Also, 

n n 

i x i 2 = J2^ iX iYj a ^ 

n 

|2 



= J2 «i% x * x i = Yl a i a J s ij = J2 



a t 
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Therefore, 



*Ax (E?=i a i**) (Ei=i % A i x i) 



l x l E»=ikl 



Eij aiajXjtfxj J2ij aiCtjXjdij 



,2 v^n i i2 



En i iz v-^^ I 

;=iN E;=ik 

- ELiN^* r, r A A i 
- ~^ — — ^-G[Ai,A n j. 

Ei=i|a»l 

In other words, the Rayleigh quotient is always between the largest and the smallest eigenvalues of 
A. When x = x n , the Rayleigh quotient equals the largest eigenvalue and when x = xi the Rayleigh 
quotient equals the smallest eigenvalue. Suppose you calculate a Rayleigh quotient. How close is it 
to some eigenvalue? 

Theorem 15.3.2 Let x/0 and form the Rayleigh quotient, 

xMx 



X 



Then there exists an eigenvalue of A, denoted here by X q such that 



|A 9 -,|<^^. (15.2) 

Proof: Let x = Efc=i a & x fc where {x/e}^ =1 is the orthonormal basis of eigenvectors. 

\Ax — qx\ = (Ax — gx)* (Ax — qx) 

= ^2 a ^k*k ~ qcikX-k J ^2 a k^k*k ~ qakX-k I 

Y ( x j ~ ^) ^' x l ( Y ( Xk ~ g ) afcXfc ) 

= Y ( X i ~ q "> n i ( Afe ~ q "> a & x j x fc 

j,k 

n 

= 5Zl afc | 2 ( Afc ~^ 2 
k=l 

Now pick the eigenvalue, X q which is closest to q. Then 

n n 

\Ax - qx\ 2 = Y^ \ a k\ 2 (Afc - qf > (X q - qf ^ |a fe | 2 = (X q - qf |x| 2 

k=l k=l 

which implies 15.2. ■ 

/l 2 3\ 

Example 15.3.3 Consider the symmetric matrix A = 2 2 1 . Let 

x = (l,l,lf. 

iJou> dose zs £fte Rayleigh quotient to some eigenvalue of A? Find the eigenvector and eigenvalue to 
several decimal places. 
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Everything is real and so there is no need to worry about taking conjugates. Therefore, the 
Rayleigh quotient is 

( 1 1 1 )[ 2 2 1 I I 1 

3 1 4 / \ 1 / 19 

~3 ~ y 

According to the above theorem, there is some eigenvalue of this matrix, X q such that 



19 

y 



< 




V3 



1 

7i 



/-n 



v ^ J 



^+d) 2 +(i) 2 



1.2472 



Could you find this eigenvalue and associated eigenvector? Of course you could. This is what 

the inverse shifted power method is all about. 

Solve 

10 

1 

1 






In other words solve 



/ 



V 



_I6 
3 

2 
3 



13 
3 



3 \ 

1 



x 

y 



J 




and divide by the entry which is largest, 3. 870 7, to get 



u 2 



.699 25 

. 493 89 

1.0 



Now solve 



V 



_16 
3 

2 
3 



13 

3 




.699 25 

.493 89 

1.0 



and divide by the entry with largest absolute value, 2. 9979 to get 



u 3 




Now solve 



_16 
3 

2 
3 



13 

" 3 
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and divide by the entry with largest absolute value, 3.045 4, to get 



u 4 



Solve 



/ -f 2 3 \ 



V 



13 



i -1 



and divide by the largest entry, 3. 042 1 to get 



u 5 



.713 7 

.520 56 

1.0 



x 

y 



.713 78 

. 520 73 

1.0 



.713 7 

.520 56 

1.0 



You can see these scaling factors are not changing much. The predicted eigenvalue is obtained by 
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solving 



to obtain A = 6. 6621. How close is this? 



19 

3 



3. 042 1 



while 




4. 755 2 

3.469 

6. 662 1 



6. 662 1 



4.755 3 
3.469 2 
6.6621 



You see that for practical purposes, this has found the eigenvalue and an eigenvector. 

15.4 The QR Algorithm 

15.4.1 Basic Considerations 

The QR algorithm is one of the most remarkable techniques for finding eigenvalues. In this section, 
I will discuss this method. To see more on this algorithm, consult Golub and Van Loan [5]. For an 
explanation of why the algorithm works see Wilkinson [16]. There is also more discussion in Linear 
Algebra. This will only discuss real matrices for the sake of simplicity. Also, there is a lot more to 
this algorithm than will be presented here. First here is an introductory lemma. 

Lemma 15.4.1 Suppose A is a block upper triangular matrix, 

B x * 







B r 



This means that the Bi are V{ x ri matrices whose diagonals are subsets of the main diagonal of A. 
Then a (A) = U[ =1 <j(£ i ). 

Proof: Say Q*BiQi = Ti where T{ is upper triangular. Such unitary matrices exist by Schur's 
theorem. Then consider the similarity transformation, 



Ql 



o \ 



Bx 



Qi 



o 



Q* r J 

By block multiplication this equals 

Ql o 

Q* r 

Q\B\Qi 



B r 



( B X Q^ 



V o 

\ 



LJ r \c£ r 



Ti 



\ 



Q* r B r Q r J \ T r J 

Now this is a real upper triangular matrix and the eigenvalues of A consist of the union of the 
eigenvalues of the T{ which is the same as the union of the eigenvalues of the B{. ■ 
Here is the description of the great and glorious QR algorithm. 
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The QR Algorithm 

Let A be an n x n real matrix. Let Aq = A. Suppose that Ak-i has been found. To find Ak let 

Ak-i = QkRk, A k = RkQk, 

where QkRk is a QR factorization of Ak-\. Thus R is upper triangular with nonnegative entries on 
the main diagonal and Q is real and unitary (orthogonal). 

The main significance of this algorithm is in the following easy but important theorem. 

Theorem 15.4.2 Let A be any n x n complex matrix and let {Ak} be the sequence of matrices 
described above by the QR algorithm. Then each of these matrices is unitarily similar to A. 

Proof: Clearly A$ is orthogonally similar to A because they are the same thing. Suppose then 
that 

A k -i = Q*AQ 

Then from the algorithm, 

Ak-i = QkRk, Rk = QtAk-i 

Therefore, from the algorithm, 

Ak = RkQk = QtAk-iQk = QlQ*AQQk = (QQk) AQQ k , 

and so Ak is unitarily similar to A also. ■ 

Although the sequence {Ak} may fail to converge, it is nevertheless often the case that for large 
fc, Ak is of the form 

/ B k * 

A k =\ 

\ e B r 

where the Bi are blocks which run down the diagonal of the matrix, and all of the entries below 
this block diagonal are very small. Then letting Tb denote the matrix obtained by setting all of 
these small entries equal to zero, one can argue, using methods of analysis, that the eigenvalues of 
Ak are close to the eigenvalues of Tb- From Lemma 15.4.1 the eigenvalues of Tb are the eigenvalues 
of the blocks B{. Thus, the eigenvalues of A are the same as those of Ak and these are close to the 
eigenvalues of Tb • 

In proving things about this algorithm and also for the sake of convenience, here is a technical 
result. 

Corollary 15.4.3 For Qk,Rk-,Ak given in the QR algorithm, 

A = Q 1 ---QkA k Qi---Ql (15.3) 

For QW =Qi---Qk and R^ = R k ■ • • Ri, it follows that 

A k = QWRW 

Here A k is the usual thing, A raised to the k th power. 

Proof: From the algorithm, 

A = A = Q X R U Q*A = R u A x = R 1 Q 1 = Q\AQ 1 

Hence 

Q 1 A 1 Q\=A 
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Suppose the formula 15.3 holds for k. Then from the algorithm, 

Au = Qk+iRk+i, Rk+i = Qt+iAk, ^U+i = Rk+iQk+i = Qt+iAkQk+i 

Hence Q k+1 A k+1 Ql +1 = A k and so 

A — Qi'- QkA k Ql • • • Q\ = Qi • • • Q k Q k+1 A k+1 Ql +1 Ql • • • Q\ 

This shows the first part. 

The second part is clearly true from the algorithm if k — 1. Then from the first part and the 
algorithm, 



A — Qi-- - Q k Q k +iA k +iQl +1 Ql • • • Q\ — Qi • • • QkQk+iRk+iQk+iQk+iQk ' ' ' Qi 



It follows that 



A k+i = AA k = Q 1 ---Q k Q k+1 R k+1 Ql---Q* 1 Q^R^ 

= Q( fe+1 )i? fc+1 (V fe ))V fe )i?( fe > 
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Hence 

A k+1 = Q( k+1 >R k+1 R^ = Q( fc+1 )i?( fe+1 ) ■ 

Now suppose that A -1 exists. How do two QR factorizations compare? Since A -1 exists, it 
would require that if A = QR, then R~ x must exist. Now an upper triangular matrix has inverse 
which is also upper triangular. This follows right away from the algorithm presented early in the 
book for finding the inverse. If A = Q\R\ = Q2^2, then QIQ2 = R1R2 1 an d so RiR^ 1 is an upper 
triangular matrix which is also unitary and in addition has all positive entries down the diagonal. 
For simplicity, call it R. Thus R is upper triangular and RR* = R*R = I. It follows easily that R 
must equal / and so R\ = R2 which requires Qi = Qi- 

Now in the above corollary, you know that 

A = Q 1 --- Q k A k Q* k ■ ■ ■ Q\ = Q^A k (q«) * 

Also, from this corollary, you know that 

A k = QWRW 

You could also simply take the QR factorization of A k to obtain A k = QR. Then from what was 
just pointed out, if A -1 exists, 

Q ik) = Q 

Thus from the above corollary, 

A k = (Q (/c) )* AQ^ =Q*AQ 
Therefore, in using the QR algorithm in the case where A has an inverse, it suffices to take 

A k = QR 

and then consider the matrix 

Q*AQ = A k . 

This is so theoretically. In practice it might not work out all that well because of round off errors. 
There is also an interesting relation to the power method. Let 

A = ( ai • • • a n ) 

Then from the way we multiply matrices, 

A k+i = ( A k ai . . . A k an ) 

and for large fc, A k s.i would be expected to point roughly in the direction of the eigenvector corre- 
sponding to the largest eigenvalue. Then if you form the QR factorization, 

A k+1 = QR 

the columns of Q are an orthonormal basis obtained essentially from the Gram Schmidt procedure. 
Thus the first column of Q has roughly the direction of an eigenvector associated with the largest 
eigenvalue of A. It follows that the first column of Q*AQ is approximately equal to Aiqi and so the 
top entry will be close to Aiq^qi = Ai and the entries below it are close to 0. Thus the eigenvalues 
of the matrix should be close to this top entry of the first column along with the eigenvalues of the 
(n — 1) x (n — 1) matrix in the lower right corner. If this is a 2 x 2 you can find the eigenvalues 
using the quadratic formula. If it is larger, you could just use the same procedure for finding its 
eigenvalues but now you are dealing with a smaller matrix. 
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Example 15. 4 .4 Find the eigenvalues of the matrix 



5 


4 


3 


2 


3 


2 


-8 


-9 


-6 



First use the computer to raise this matrix to a large power. 



5 4 3.0 \ 2T / 2.6844xl0 8 1.3422x10* 1.342 2x10* 





2 3 2 I = I -2.0 -3.0 -2.0 

-2.6844 x 10 8 -1.342 2 x 10 8 -1.342 2 x 10 8 

Now find the QR factorization of this last matrix. The Q equals 

0.707 11 -3. 725 2 x 10~ 9 -0.707 11 

-5. 268 3 x 10~ 9 -1. 000 1. 441 6 x 10~ 21 

-0.707 11 3. 725 2 x 10~ 9 -0.707 11 

Next examine 

0.70711 -3.725 2 xlO- 9 -0.70711 

5. 268 3 x 10~ 9 -1. 000 1. 441 6 x 10~ 21 

-0.707 11 3. 725 2 x 10~ 9 -0.707 11 

5 4 3.0 \ / 0.70711 -3.725 2 x HT 9 -0.70711 

2 3 2 II -5.268 3 xlO" 9 -1.000 1.4416 xlO -21 

-0.707 11 3. 725 2 x 10~ 9 -0.707 11 

2.0 -9.1924 -11.0 

5.2684 xlO" 9 3.0 2.828 4 

-1.862 6 xlO" 8 -3.535 6 -3.0 

You see now that this is essentially equal to the block upper triangular matrix 

-9.1924 -11.0 
3.0 2.8284 

-3.535 6 -3.0 

and the eigenvalues of this matrix can be easily identified. They are 2 and the eigenvalues of the 
block 

3.0 2.8284 

-3.535 6 -3.0 

But these can be found easily using the quadratic formula. This yields z, — i. In fact the exact 
eigenvalues for this matrix are 2,i,—i. 

15.4.2 The Upper Hessenberg Form 

Actually, when using the QR algorithm, contrary to what I have done above, you should always deal 
with a matrix which is similar to the given matrix which is in upper Hessenberg form. This means 
all the entries below the sub diagonal equal 0. Here is an easy lemma. 

Lemma 15.4.5 Let A be an n x n matrix. Then it is unitarily similar to a matrix in upper Hes- 
senberg form and this similarity can be computed. 

Proof: Let A be an n x n matrix. Suppose n > 2. There is nothing to show otherwise. 

a b 



d A 1 
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where A\\sn — lxn — 1. Consider the n — 1 x 1 matrix d. Then let Q be a Householder reflection 

such that 

c 



Then 

' 1 \ / a b\/l 
Q J \d A x J \0 Q* 

a bQ* 
c QA ± Q* 

By similar reasoning, there exists an n — lxn — 1 matrix 

1 



U \ Q, 
such that 

Thus 

1 \ f a bQ* \ ( 1 

U J \ c QA X Q* )\0U* 

will have all zeros below the first two entries on the sub diagonal. Continuing this way shows the 
result. ■ 

The reason you should use a matrix which is upper Hessenberg and similar to A in the QR 
algorithm is that the algorithm keeps returning a matrix in upper Hessenberg form and if you are 
looking for block upper triangular matrices, this will force the size of the blocks to be no larger than 
2x2 which are easy to handle using the quadratic formula. This is in the following lemma. 

Lemma 15.4.6 Let {A^} be the sequence of iterates from the QR algorithm, A~ x exists. Then if 
Ak is upper Hessenberg, so is Ak+i- 

Proof: The matrix is upper Hessenberg means that Aij = whenever i — j > 2. 

Ak+i = RkQk 

where A^ = QkRk- Therefore A^R^ 1 = Qk and so 

Ak+i — RkQk — RkAkR k 

Let the ij th entry of A^ be a\y Then if i — j > 2 



n j 

fc+i _V Vr a k r' 1 



It is given that a^ q = whenever p — q > 2. However, from the above sum, 

P ~ Q > i ~ j > 2, 
and so the sum equals 0. ■ 

Example 15.4.7 Find the solutions to the equation x 4 — 4r 3 + Sx 2 — Sx + 4 = using the QR 
algorithm. 
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This is the characteristic equation of the matrix 

/ 4 -8 8 -4 \ 

10 

10 

\0 1 / 

Since the constant term in the equation is not 0, it follows that the matrix has an inverse. It is 



already in upper Hessenberg form. Lets apply the algorithm. 



-7.516 2 x 10 9 
-7.516 2 x 10 9 
-3.7581 x 10 9 
-6. 710 9 x 10 7 



/ 4 
1 





-8 


1.0 





\ 



55 



/ 



3.033 3 x 10 10 
2.254 9 x 10 10 
7.516 2 x 10 9 

-3. 489 7 x 10 9 



-4. 509 7 x 10 10 

-2.979 6 x 10 10 

-7.516 2 x 10 9 

6.979 3 x 10 9 



3.006 5 x 10 10 \ 
1.503 2 x 10 10 
2.6844 x 10 8 
-6.979 3 x 10 9 J 
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Then when you take the QR factorization of this, you find Q = 



/ 



-0.666 65 

-0.666 65 

-0.333 33 

-5.952 3 x 10- 



0.605 55 
-0.305 4 
-0.592 53 
-0.434 68 



-0.40745 

0.446 60 

-6.4112 x 10- 2 

-0.793 99 



0.15121 \ 
-0.512 69 

0.730 54 
-0.424 96 ) 



Then you look at 



Q 1 



- 4 \ 



o 

1.0 
















1 






which yields 



0.652 78 
0.757 89 
-9.699x 10- 
1.635 5 x 10" 



-1.540 9 
1.382 3 
1.0434 x 10- 
7.249 8 x 10- 



1.8161 
-1.6501 
0.875 04 
0.178 40 



-8.1011 
7.3584 

-5.4711 
1.089 9 



Of course the entries in the bottom left should all be 0. They aren't because of round off error. The 
only other entry of importance is 1. 0434 x 10 -3 which is small. Hence the eigenvalues are close to 
the eigenvalues of the two blocks 



0.652 78 
0.75789 



-1.540 9 
1.382 3 



0.875 04 
0.17840 



-5.4711 
1.089 9 



This yields 



0.982 47 + 0.982 09i, 0.982 47 - 0.982 09z 
1.017 5 + 1. 017 2i, 1.017 5- 1.017 2i 

The real solutions are l + i,l + i,l — i, and 1 — i. You could of course use the shifted inverse power 
method to get closer and to also obtain eigenvectors for the matrix. 

Example 15.4.8 Find the eigenvalues for the symmetric matrix 

-i \ 

9 n 1 : 
A 



( 1 


2 


3 




2 





1 


3 


3 


1 


3 


2 


^ - 1 


3 


2 


1 



1 / 



Also find an eigenvector. 



I should work with a matrix which is upper Hessenberg which is similar to the given matrix. 
However, I don't feel like it. It is easier to just raise to a power and see if things work. This is what 
I will do. 



/ 



V 



1 

2 
3 
-1 



1\ 



35 



/ 



/ 1.2091 x 10 28 
1.118 8 x 10 28 
1.876 9 x 10 28 

\ 1.045 8 x 10 28 



1.118 8 x 10 28 
1.035 3 x 10 28 
1.736 9 x 10 28 
9.6774 x 10 27 



1.876 9 x 10 28 
1.736 9 x 10 28 
2. 913 7 x 10 28 
1.623 5 x 10 28 



\ 



1.045 8 x 10 28 
9.6774 x 10 27 
1.623 5 x 10 28 
9.045 5 x 10 27 J 
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Now take the QR factorization of this. When you do, Q = 



/ 0.446 59 
0.413 24 
0.693 24 

\ 0.386 27 



-0.737 57 

-0.10617 

0.640 97 

-0.184 03 



-0.435 04 
0.825 07 

-0.31212 
0.18047 



0.259 39 \ 
0.370 43 
0.105 57 
-0.885 64 J 



Thus you look at Q T AQ, a matrix which is similar to A and which equals 



/ 



V 



6. 642 9 
1.6379 x 10- 
1.195 5 x 10- 
1.335 6 x 10" 



1.6379 x 10- 
-1.4751 
-1.1349 
-1.6375 



1.195 5 x 10- 
-1.1349 
0.20311 
-2.507 7 



1.335 6 x 10- 
-1.6375 
-2.507 7 
-0.37101 



/ 



It follows that the eigenvalues are approximately 6. 642 9 and the eigenvalues of the 3x3 matrix in 
the lower right corner. I shall use the same technique to find its eigenvalues. 



B 37 = 



-1.4751 
-1.1349 
-1.6375 



-1.738 6 x 10 22 
-1.483 9 x 10 22 
-1.760 5 x 10 22 



-1.1349 
0.20311 
-2.507 7 

-1.483 9 x 10 22 
-1.266 5 x 10 22 
-1.502 6 x 10 22 



Then take the QR factorization of this to get Q 



-0.602 6 
-0.51432 
-0.610 20 



-6.115 8 x 10- 

0.79213 

-0.607 28 



-1.6375 x 3V 
-2.507 7 
-0.37101 

-1.760 5 x 10 22 
-1.502 6 x 10 22 
-1.782 7x 10 22 



0.795 69 
-0.328 63 

-0.508 80 



Then you look at Q T BQ which equals 

-4.1018 

-3.848 5 x 10~ 5 

2.0234 x 10~ 5 



-3.848 5 x 10" 
2.3861 
0.416 67 



2.023 4 x 10~ 5 

0.416 67 
7.275 9 x 10~ 2 



Thus the eigenvalues are approximately 6. 642 9, —4. 101 8, and the eigenvalues of the matrix in the 
lower right corner in the above. These are 2. 458 9,-1. 480 x 10 -6 . In fact, is an eigenvalue of the 
original matrix and it is being approximated by —1. 480 x 10 -6 . To summarize, the eigenvalues are 
approximately 

6.642 9, -4.1018, 2.458 9, 

Of course you could now use the shifted inverse power method to find these more exactly if 
desired and also to find the eigenvectors. 

If you wanted to find an eigenvector, you could start with one of these approximate eigenvalues 
and use the shifted inverse power method to get an eigenvector. For example, pick 6. 642 9. 

II 



VV 



1 


2 3-1 


\ 




(1 





\ 


\ 


2 13 

3 13 2 




- 6. 642 9 






1 
1 




-13 2 1 


/ Vo 


OOlJJ 


/ 3189.3 


2951.5 4951.3 


2758.8 \ 




2951.5 


2731.1 4581.9 


2552.9 






4951.3 


4581.9 7686.2 


4282. 7 






\ 2758.8 


2 


552.9 428^ 


>.7 


2386.0 J 
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/ 3189.3 2951.5 4951.3 

2951.5 2731.1 4581.9 

4951.3 4581.9 7686.2 

\ 2758.8 2552.9 4282.7 

/ 9. 061 9 x 10 20 \ 

8. 385 7 x 10 20 

1.406 8 x 10 21 

\ 7. 838 1 x 10 20 J 



So try for the next approximation 



/ 9. 061 9 x 10 20 \ 

8. 385 7 x 10 20 

1.406 8 x 10 21 

\ 7. 838 1 x 10 20 ) 



1 



2758.8 
2552.9 
4282. 7 
2386.0 



1 
1 



1.406 8 x 10 21 



/ 0.64415 \ 

0.596 08 

1.0 

\ 0.55716 / 



\ 



Next one is 



/ 3189.3 
2951.5 
4951.3 

\ 2758.8 

/ 10302. 
9533.4 
15993. 

V 8910.9 J 



1 



2951.5 
2731.1 
4581.9 
2552.9 



4951.3 
4581.9 
7686. 2 

4282. 7 



2758.8 
2552.9 

4282. 7 
2386.0 



/ 0.64415 \ 

0.596 08 

1.0 

V 0.55716 J 



15993 



I 10302. \ 
9533.4 
15993. 

V 8910.9 J 



/ 0.64416 \ 

0.59610 

1.0 

\ 0.55718 J 



This isn't changing by much from the above vector and so the scaling factor will be about 15993. 
Now solve 

l — — = 15993 

A -6. 642 9 
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The solution is 6. 643 0. The approximate eigenvector is what I just got. Lets check it. 



/ 1 
2 

3 



6.643 



-1 \ / 0.64416 \ 
3 0.59610 

2 1.0 

1 / \ 0.55718 / 

/ 0.64416 \ 

0.59610 

1.0 

V 0.55718 j 



( 4.279 2 \ 
3.959 9 
6. 642 9 

\ 3.7013 J 

I 4.279 2 \ 

3.959 9 

6.643 

\ 3.7013 / 



This is clearly very close. This has essentially found the eigenvalues and an eigenvector for the 
largest eigenvalue. 
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15.5 Exercises 

1. Using the power method, find the eigenvalue correct to one decimal place having largest abso- 

/ -4 -4 \ 

lute value for the matrix A = I 7 10 5 along with an eigenvector associated with 

\ -2 6 / 
this eigenvalue. 

2. Using the power method, find the eigenvalue correct to one decimal place having largest abso- 

/ 15 6 1 \ 

lute value for the matrix A = —5 2 1 along with an eigenvector associated with this 

V 1 2 1 J 
eigenvalue. 

3. Using the power method, find the eigenvalue correct to one decimal place having largest ab- 

/ 10 4 2 \ 

solute value for the matrix A = —3 2 —1 along with an eigenvector associated with 

\ 4 J 
this eigenvalue. 

4. Using the power method, find the eigenvalue correct to one decimal place having largest abso- 

/ 15 14 -3 \ 

lute value for the matrix A = —13 —18 9 along with an eigenvector associated with 

\ 5 10 -1 J 
this eigenvalue. 

5. In Example 15.3.3 an eigenvalue was found correct to several decimal places along with an 
eigenvector. Find the other eigenvalues along with their eigenvectors. 

/3 2 1\ 

6. Find the eigenvalues and eigenvectors of the matrix A = I 2 1 3 1 numerically. In this 

\ 1 3 2 J 
case the exact eigenvalues are ±a/3, 6. Compare with the exact answers. 

/3 2 1\ 

7. Find the eigenvalues and eigenvectors of the matrix A = 2 5 3 I numerically. The exact 

V 1 3 2 J 
eigenvalues are 2,4 + a/15, 4 — \/l5. Compare your numerical results with the exact values. Is 
it much fun to compute the exact eigenvectors? 

/0 2 1\ 

8. Find the eigenvalues and eigenvectors of the matrix A = 2 5 3 I numerically. We don't 

v i 3 2 ; 

know the exact eigenvalues in this case. Check your answers by multiplying your numerically 
computed eigenvectors by the matrix. 

/0 2 1\ 

9. Find the eigenvalues and eigenvectors of the matrix A = 2 3 numerically. We don't 

v i 3 2 ; 

know the exact eigenvalues in this case. Check your answers by multiplying your numerically 
computed eigenvectors by the matrix. 

3 2 3 \ 

10. Consider the matrix A — [ 2 1 4 and the vector (1, 1, 1) . Estimate the distance be- 

3 4 0/ 
tween the Rayleigh quotient determined by this vector and some eigenvalue of A. 
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11. Consider the matrix A 



1 2 M 

2 14 and the vector (1,1,1) . Estimate the distance be- 

14 5/ 
tween the Rayleigh quotient determined by this vector and some eigenvalue of A. 



12. Using Gerschgorin's theorem, find upper and lower bounds for the eigenvalues of 




13. The QR algorithm works very well on general matrices. Try the QR algorithm on the following 
matrix which happens to have some complex eigenvalues. 




Use the QR algorithm to get approximate eigenvalues and then use the shifted inverse power 
method on one of these to get an approximate eigenvector for one of the complex eigenvalues. 

14. Use the QR algorithm to approximate the eigenvalues of the symmetric matrix 




3 3 1 

15. Try to find the eigenvalues of the matrix [ —2 —2 —1 ] using the QR algorithm. It has 

10 

eigenvalues 1, i, — i. You will see the algorithm won't work well. ► 

16. Let q (A) = a^ + a\\ + • • • + a n _iA n_1 + A n . Now consider the companion matrix, 



C 



( ••• 

1 



-a ^ 



Vo 



1 -CLn-l / 



Show that q (A) is the characteristic equation for C. Thus the roots of q (A) are the eigenvalues 
of C. You can prove something similar for 



C 



I —ttn-l —Un-2 
1 



V 



) 



Hint: The characteristic equation is 

( X 

det 



1 A 



3 a \ 

-1 A + a n _i J 
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Expand along the first column. Thus 



A 







-1 A 







-1 A + (!„ 




-1 A 





a 
-1 A + a 3 



Now use induction on the first term and for the second, note that you can expand along the 
top row to get 

(-l) n_ a (-l) n = a . 

17. Suppose A is a real symmetric, invertible, matrix, or more generally one which has real eigen- 
values. Then as described above, it is typically the case that 



A p = QiR 



and 



QlAQ 1 



where e is very small. Then you can do the same thing with A\ to obtain another smaller 
orthogonal matrix Q2 such that 



Explain why 



1 T 
Q 2 



Q2AIQ2 



_ f a 2 hi 



QiAQ 1 



1 1 

Q 2 



a\ 



: a 2 

T T 
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where the e^ are very small. Explain why one can construct an orthogonal matrix Q such 
that 

Q T AQ = (T + E) 

where T is an upper triangular matrix and E is very small. In case A is symmetric, explain 
why T is actually a diagonal matrix. Next explain why, in the case of A symmetric, that the 
columns of Q are an orthonormal basis of vectors, each of which is close to an eigenvector. 
Thus this will compute, not just the eigenvalues but also the eigenvectors. 
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18. Explain how one could use the QR algorithm or the above procedure to compute the singular 
value decomposition of an arbitrary real m x n matrix. In fact, there are other algorithms 
which will work better. 
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Vector Spaces 



16.1 Algebraic Considerations 

16.1.1 The Definition 

It is time to consider the idea of an abstract Vector space. 

Definition 16.1.1 A vector space is an Abelian group of "vectors" denoted here by bold face letters, 
satisfying the axioms of an Abelian group, 

V + W = W + V, 

the commutative law of addition, 

(v + w) + z = v + (w + z) , 

the associative law for addition, 

v + = v, 

the existence of an additive identity, 

v + (-v) = 0, 

the existence of an additive inverse, along with a field of "scalars" ¥ which are allowed to multiply 
the vectors according to the following rules. (The Greek letters denote scalars.) 

a (v + w) = av + av, (16.1) 

(a + /3)v = av + /3v, (16.2) 

a(/3v) =a/3(v), (16.3) 

lv = v. (16.4) 

The field of scalars is usually R or C and the vector space will be called real or complex depending 
on whether the field is R or C. However, other fields are also possible. For example, one could use 
the field of rational numbers or even the field of the integers mod p for p a prime. A vector space is 
also called a linear space. 

For example, R n with the usual conventions is an example of a real vector space and C n is an 
example of a complex vector space. Up to now, the discussion has been for R n or C n and all that is 
taking place is an increase in generality and abstraction. We no longer know what the vectors are. 
We also have no idea what field is being considered, at least in the next section. 

If you are interested in considering other fields, you should have some examples other than 
C, R, Q. Some of these are discussed in the following exercises. If you are happy with only considering 
R and C, skip these exercises. 
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16.2 Exercises 

1. Prove the Euclidean algorithm: If ra, n are positive integers, then there exist integers q, r > 
such that r < ra and 

n = qm + r 

Hint: You might try considering 

S = {n — km : k G N and n — km < 0} 
and picking the smallest integer in S or something like this. 

2. |The greatest common divisor of two positive integers ra, n, denoted as g is a positive number 
which divides both ra and n and if p is any other positive number which divides both ra, n, 
then p divides q. Recall what it means for p to divide q. It means that q = pk for some integer 
k. Show that the greatest common divisor of ra, n is the smallest positive integer in the set S 

S = {xm + yn : x,y G Z and xm + yn > 0} 

Two positive integers are called relatively prime if their greatest common divisor is 1. 

3. "f A positive integer larger than 1 is called a prime number if the only positive numbers which 
divide it are 1 and itself. Thus 2,3,5,7, etc. are prime numbers. If ra is a positive integer and 
p does not divide m where p is a prime number, show that p and ra are relatively prime. 

4. ^There are lots of fields. This will give an example of a finite field. Let Z denote the set of 
integers. Thus Z = {• • • , —3, —2, —1, 0, 1, 2, 3, • • • }. Also let p be a prime number. We will say 
that two integers, a, b are equivalent and write a ~ b if a — b is divisible by p. Thus they are 
equivalent if a — b = px for some integer x. First show that a ~ a. Next show that if a ~ b 
then b ~ a. Finally show that if a ~ b and b ~ c then a ~ c. For a an integer, denote by [a] the 
set of all integers which is equivalent to a, the equivalence class of a. Show first that is suffices 
to consider only [a] for a = 0, 1, 2, • • • ,p — 1 and that for < a < b < p — 1, [a] ^ [b]. That is, 
[a] = [r] where r G {0, 1, 2, • • • ,p — 1}. Thus there are exactly p of these equivalence classes. 
Hint: Recall the Euclidean algorithm. For a > 0, a = rap + r where r < p. Next define the 
following operations. 

[a] + [b] = [a + 6] 
[a] [6] = [aft] 

Show these operations are well defined. That is, if [a] = [a') and [b] = [b f ] , then [a] + [6] = 
[a 7 ] + [b f ] with a similar conclusion holding for multiplication. Thus for addition you need to 
verify [a + b] = [a' + b') and for multiplication you need to verify [ab] = [a'b'\. For example, 
if p = 5 you have [3] = [8] and [2] = [7] . Is [2 x 3] = [8 x 7]? Is [2 + 3] = [8 + 7]? Clearly so 
in this example because when you subtract, the result is divisible by 5. So why is this so in 
general? Now verify that {[0] , [1] , • • • , [p — 1]} with these operations is a Field. This is called 
the integers modulo a prime and is written Z p . Since there are infinitely many primes p, it 
follows there are infinitely many of these finite fields. Hint: Most of the axioms are easy once 
you have shown the operations are well defined. The only two which are tricky are the ones 
which give the existence of the additive inverse and the multiplicative inverse. Of these, the 
first is not hard. — [x] = [—x\. Since p is prime, there exist integers x, y such that 1 = px + ky 
and so 1 — ky = px which says 1 ^ ky and so [1] = [ky] . Now you finish the argument. What 
is the multiplicative identity in this collection of equivalence classes? 
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16.2.1 Linear Independence And Bases 

Just as in the case of F n one has a concept of subspace, linear independence, and bases. 

Definition 16.2.1 If {vi, • • • , v n } C V, a vector space, then 



span(vi,--- ,v n ) = <^ y^ctiVi :a,GF 




A subset, W C V is said to be a subspace if it is also a vector space with the same field of scalars. 
Thus W C V is a subspace if ax + by G W whenever a, b G F and x, y G W. The span of a set of 
vectors as just described is an example of a subspace. 

Definition 16.2.2 If {vi, • • • , v n } C V, the set of vectors is linearly independent if 

n 
^CYiVi = 



2=1 



implies 

OL\ — • • • = OL n — 

and {vi, • • • , v n } is called a basis for V if 

span(vi,--- ,v n ) = V 

and {vi, • • • , v n } is linearly independent. The set of vectors is linearly dependent if it is not linearly 
independent. 

The next theorem is called the exchange theorem. It is very important that you understand this 
theorem. There are two kinds of people who go further in linear algebra, those who understand 
this theorem and its corollary presented later and those who don't. Those who do understand these 
theorems are able to proceed and learn more linear algebra while those who don't are doomed to 
wander in the wilderness of confusion and sink into the swamp of despair. Therefore, I am giving 
multiple proofs. Try to understand at least one of them. Several amount to the same thing, just 
worded differently. Before giving the proof, here is some important notation. 

Notation 16.2.3 Let w^- G V, a vector space and let 1 < i < r while 1 < j < s. Thus these vectors 
can be listed in a rectangular array. 



Wn 


W12 •• 


• Wi. 


W 2 i 


w 22 . . 


• w 2 . 


W r i 


w r2 •• 


• w r , 
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Then ^j=i S[=i w u means to sum the vectors in each column and then to add the s sums which 
result while Y^=i Ey=i w ij means to sum the vectors in each row and then to add the r sums which 
result. Either way you simply get the sum of all the vectors in the above array. This is because you 
can add vectors in any order and you get the same answer. 

Theorem 16.2.4 Let {xi, • • • ,x r } be a linearly independent set of vectors such that each x^ is in 
the span{yi, • • • , y s } . Then r < s. 



Proof 1: Let 



X/c 






If r > s, then the matrix A = (ajk) has more columns than rows. By Corollary 8.2.8 one of these 
columns is a linear combination of the others. This implies there exist scalars ci, • • ■ , c r such that 



y^QjfeCfe = 0, j = 1, 



,r 



k=i 
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Then 



J2 c/cx/c = y^ °k Yl a ^ y ^ = YI 

k=l k=l j=l j = l 



/ J C-k^jk 



Yi=0 



which contradicts the assumption that {xi, • • • ,x r } is linearly independent. Hence r < s. ■ 
Proof 2: Define span(yi, • • • , y s ) = V, it follows there exist scalars, ci, • • • , c s such that 

xi = y^Qy^. (16.5) 

2=1 

Not all of these scalars can equal zero because if this were the case, it would follow that xi = and 
so {xi, • • • , x r } would not be linearly independent. Indeed, if xi = 0, lxi + Ym=2 ^ x * = x i = an d 
so there would exist a nontrivial linear combination of the vectors {xi, • • • , x r } which equals zero. 
Say Ck 7^ 0. Then solve (16.5) for y k and obtain 

s-l vectors here 



y k G span | xi,yi,--- ,y fe -i,y/c+i, ■ • • ,y a 

Define {z ir ■ ■ ,z s _i} by 

{zi,--- ,z s _i} = {yi,--- ,y k -i, yk+W" Js} 
Therefore, span (xi, zi, • • • , z s _i) = V because if v 6 V, there exist constants c±, • • • , c s such that 

s-l 

v = ^2ciZi + c 3 y k . 

i=l 

Replace the y k in the above with a linear combination of the vectors, 

{xi,zi,- • • ,Z s _i} 

to obtain v G span (xi, zi, • • • , z s _i) . The vector y k: in the list {yi, • • • , y s } , has now been replaced 
with the vector xi and the resulting modified list of vectors has the same span as the original list of 
vectors, {yi,--- ,y s }. 

Now suppose that r > s and that span (xi, • • • , x/, zi, • • • , z p ) = V, where the vectors, zi, • • • , z p 
are each taken from the set, {yi, • • • ,y s } and I +p = s. This has now been done for I = 1 above. 
Then since r > s, it follows that I < s < r and so I + 1 < r. Therefore, x^ + i is a vector not in the 
list, {xi, • • • , xj} and since span (xi, • • • , x/, z±, • • • , z p ) = V there exist scalars, q and dj such that 

i p 

xz+i = ^cpq + ^d j z i . (16.6) 

i=i j=i 

Not all the dj can equal zero because if this were so, it would follow that {xi, • • • ,x r } would be a 
linearly dependent set because one of the vectors would equal a linear combination of the others. 
Therefore, (16.6) can be solved for one of the z^, say z&, in terms of x^ + i and the other zi and just 
as in the above argument, replace that z^ with x^ + i to obtain 

p-l vectors here 



span xi,- • -xjjXj+ijZi,- • • Zfc_i, z fc+ i, • • • , z p \ = V. 
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Continue this way, eventually obtaining 

span(xi,--- ,x s ) = V. 

But then x r G span (xi, • • • , x s ) contrary to the assumption that {xi, • • • ,x r } is linearly indepen- 
dent. Therefore, r < s as claimed. ■ 

Proof 3: Let V = span (y 1? • • • , y s ) and suppose r > s. Let A\ = {xi, • • • , x/} , Aq = 0, and let 
B s -i denote a subset of the vectors, {yi, • • • , y s } which contains s — I vectors and has the property 
that span (A^ B s _i) = V. Note that the assumption of the theorem says span (Aq, B s ) = V. 

Now an exchange operation is given for span (A^ B s _i) = V. Since r > s, it follows I < r. Letting 

B s _i = {zi, • • • , z s _J C {yi, • • • , y s } , 
it follows there exist constants, q and di such that 

l s-l 



x^+i = ^2 c * x * + YI d < 



z, 



i=l 2=1 



and not all the d{ can equal zero. (If they were all equal to zero, it would follow that the set, 
{xi, • • • , x r } would be dependent since one of the vectors in it would be a linear combination of the 
others.) 

Let dk 7^ 0. Then Zk can be solved for as follows. 

1 ^— \ Ci ^— \ di 

dk f-f d k ^rrt dk 

This implies V = span(^ + i,i? s _z_i), where B s -i_i = B s _i \ {z/c} , a set obtained by deleting 
Zfc from Bk-i. You see, the process exchanged a vector in B s _i with one from {xi,--- ,x r } and 
kept the span the same. Starting with V = span(^4o?^s) •> do the exchange operation until V = 
span (A s _i, z) where z G {yi, • • • , y s } . Then one more application of the exchange operation yields 
V = span(A s ). But this implies x r G span(A s ) = span(xi,--- ,x s ), contradicting the linear 
independence of {xi, • • • , x r } . It follows that r < s as claimed. ■ 

Proof 4: Suppose r > s. Let z& denote a vector of {yi, • • • ,y s } . Thus there exists j as small 
as possible such that 

span(yi,--- ,y s ) = span (xi , • • • ,x m ,zi,--- ,z j ) 

where m + j = s. It is given that m = 0, corresponding to no vectors of {xi, • • • ,x m } and j = s, 
corresponding to all the y^ results in the above equation holding. If j > then m < s and so 



j 
\z, 

fc = l 2=1 



x m+ i = ^2 afcXfe +X^ 



Not all the bi can equal and so you can solve for one of them in terms of 

Xm+i? X m , , XX , 

and the other z k . Therefore, there exists 

{zi,--- ,2>j-i} C {yi,--- ,y s } 

such that 

span(yi,--- ,y s ) = span(xi,--- ,x m +i,zi,--- ^j-\) 



Download free eBooks at bookboon.com 

424 



Elementary Linear Algebra Vector Spaces 

contradicting the choice of j. Hence j = and 

span(yi,--- ,y s ) = span(xi,--- ,x s ) 

It follows that 

x s+ i e span(xi,--- ,x s ) 

contrary to the assumption the x& are linearly independent. Therefore, r < s as claimed. ■ 

Corollary 16.2.5 If {ui, • • • , u m } and {vi, • • • , v n } are £wo 6ases for V, tften m = n. 

Proof: By Theorem 16.2.4, m < n and n < m. ■ 

This corollary is very important so here is another proof of it given independent of the exchange 
theorem above. 

Theorem 16.2.6 Let V be a vector space and suppose {ui, • • • , u/e} and {vi, • • • , v m } are two bases 
for V. Then k = m. 

Proof: Suppose k > m. Then since the vectors, {ui,--- ,u/e} span V, there exist scalars, Qj 
such that 

m 

i=l 

Therefore, 

k km 

2_] dj\ij = if and only if V^ V^ CijdjVi = 

j=l j=l z=l 

if and only if 

m / k \ 

Now sincejvi, • • • , v n } is independent, this happens if and only if 

k 
y Cijdj =0, z = 1, 2, • • • ,77i. 

However, this is a system of m equations in k variables, di, • • • , d^ and m < k. Therefore, there 
exists a solution to this system of equations in which not all the dj are equal to zero. Recall why 
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this is so. The augmented matrix for the system is of the form ( C ) where C is a matrix which 
has more columns than rows. Therefore, there are free variables and hence nonzero solutions to the 
system of equations. However, this contradicts the linear independence of {ui, • • • , u&} because, as 
explained above, X^=i ^j u j ~ 0- Similarly it cannot happen that m > k. ■ 

Definition 16.2.7 A vector space V is of dimension n if it has a basis consisting of n vectors. This 
is well defined thanks to Corollary 16.2.5. It is always assumed here that n < oo in this case, such 
a vector space is said to be finite dimensional. 

Theorem 16.2.8 IfV = span (m, • • • , u n ) then some subset o/{ui, • • • , u n } is a basis for V. Also, 
if {ui, • • • , Ufc} C V is linearly independent and the vector space is finite dimensional, then the set 
{ui, • • • , u/c}, can be enlarged to obtain a basis of V. 

Proof: Let 

S = {E C {ui, • • • , u n } such that span (E) — V}. 

For E E S, let \E\ denote the number of elements of E. Let 

m = min{|^| such that E E S}. 
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Thus there exist vectors 
such that 



{vi,-*- ,v m } C {ui,--- ,u n } 



span(vi,--- ,v m ) = V 

and m is as small as possible for this to happen. If this set is linearly independent, it follows it is 
a basis for V and the theorem is proved. On the other hand, if the set is not linearly independent, 
then there exist scalars, 

Clj ' i Cm 

such that 

m 

o = ^2 ° iVi 

i=l 

and not all the q are equal to zero. Suppose Ck ^ 0. Then the vector v^ may be solved for in terms 
of the other vectors. Consequently, 

V = span(vi,--- ,Vfc_i,Vifc + i,--- ,v m ) 

contradicting the definition of m. This proves the first part of the theorem. 

To obtain the second part, begin with {ui, • • • , Uk} and suppose a basis for V is 

{vi,--- ,v n }. 

If 

span(ui,--- ,Ufc) = V, 

then k = n. If not, there exists a vector 

u fc+ i £ span(ui,--- ,u fe ). 

Then {ui, • • • , Ufc,Ufc + i} is also linearly independent. Continue adding vectors in this way until n 
linearly independent vectors have been obtained. Then 

span(ui,--- ,u n ) = V 

because if it did not do so, there would exist u n+ i as just described and {ui, • • • , u n+ i} would be a 
linearly independent set of vectors having n + 1 elements even though {vi, • • • , v n } is a basis. This 
would contradict Theorem 16.2.4. Therefore, this list is a basis. ■ 
It is useful to emphasize some of the ideas used in the above proof. 

Lemma 16.2.9 Suppose v ^ span(ui,--- , u&) and {ui,--- , u&} is linearly independent. Then 
{ui, • • • , u/e, v} is also linearly independent. 

Proof: Suppose ^Z i=1 qu^ + dv = 0. It is required to verify that each q = and that d = 0. 
But if d 7^ 0, then you can solve for v as a linear combination of the vectors, {ui, • • • , U&}, 

—Ed)- 

contrary to assumption. Therefore, d = 0. But then Yli=i c « u * = ^ anc ^ ^ ne l mear independence of 
{ui, • • • , Uk} implies each q = also. ■ 

Theorem 16.2.10 Let V be a nonzero subspace of a finite dimensional vector space, W of dimen- 
sion, n. Then V has a basis with no more than n vectors. 

Proof: Let vi E V where vi ^ 0. If span (vi) = V, stop, {vi} is a basis for V. Otherwise, there 
exists V2 G V which is not in span(vi) . By Lemma 16.2.9 {vi, V2} is a linearly independent set of 
vectors. If span(vi, V2) = V stop, {vi, V2} is a basis for V. If span(vi,V2) ^ V, then there exists 
V3 ^ span(vi,V2) and {vi,V2,V3} is a larger linearly independent set of vectors. Continuing this 
way, the process must stop before n + 1 steps because if not, it would be possible to obtain n + 1 
linearly independent vectors contrary to the exchange theorem, Theorems 16.2.4. ■ 
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16.3 Vector Spaces And Fields* 

16.3.1 Irreducible Polynomials 

There exist very interesting examples of vector spaces which are not F n and for which the field of 
scalars is not R or C. This section gives a convincing application of the value of the above abstract 
theory and provides many further examples of fields. It is an optional section which you might read 
if you find it interesting. 

Here I will give some basic algebra relating to polynomials. This is interesting for its own sake 
but also provides the basis for constructing many different kinds of fields. The first is the Euclidean 
algorithm for polynomials. 

Definition 16.3.1 A polynomial is an expression of the form p (A) = Y^k=o a kX k where as usual A 
is defined to equal 1. Two polynomials are said to be equal if their corresponding coefficients are the 
same. Thus, in particular, p{X) = means each of the a^ = 0. An element of the field A is said to 
be a root of the polynomial if p(X) = in the sense that when you plug in A into the formula and 
do the indicated operations, you get 0. The degree of a nonzero polynomial is the highest exponent 
appearing on A. The degree of the zero polynomial p (A) = is not defined. You add and multiply 
polynomials using the standard conventions learned in junior high school. 

Example 16.3.2 Consider the polynomial p (A) = A + A where the coefficients are in Z2. Is this 
polynomial equal to 0? Not according to the above definition, because its coefficients are not all equal 
to 0. However, p(l) = p (0) = so it sends every element 0/Z2 to 0. Note the distinction between 
saying it sends everything in the field to with having the polynomial be the zero polynomial. 

Lemma 16.3.3 Let f (A) and g (A) 7^ be polynomials. Then there exists a polynomial q (A) such 
that 

f(X)=q(X)g(X) + r(X) 

where the degree of r (A) is less than the degree of g (A) or r (A) = 0. 

Proof: Consider the polynomials of the form / (A) — g (A) I (A) and out of all these polynomials, 
pick one which has the smallest degree. This can be done because of the well ordering of the natural 
numbers. Let this take place when I (A) = q\ (A) and let 

r(X) = f(X)-g(X) qi (X). 

It is required to show degree of r (A) < degree of g (A) or else r (A) =0. 

Suppose / (A) — g (A) I (A) is never equal to zero for any I (A). Then r (A) 7^ 0. It is required to 
show that the degree of r (A) is smaller than the degree of g (A) . If this doesn't happen, then the 
degree of r > the degree of g. Let 

r(A) = 6 m A m + --- + M + 6 
g(X) = a n X n H haiA + a 

where m > n and b m and a n are nonzero. Then let r\ (A) be given by 

n(A) = r(A)-^ % (A) 

-vra-nr 

= (b m X m + • • • + hX + b ) ^ (a n X n + --- + ai X + a ) 

a n 

which has smaller degree than m, the degree of r (A). But 

r(A) 



X 



n(A) = /(A) -s(A) gi (A) -<7(A) 

CL n 



= /(A)-0(A)U(A) + 



A m - n 6„ 



a ri 
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and this is not zero by the assumption that / (A) — g (A) I (A) is never equal to zero for any I (A) yet 
has smaller degree than r (A) which is a contradiction to the choice of r (A). ■ 

Now with this lemma, here is another one which is very fundamental. First here is a definition. 
A polynomial is monic means it is of the form 



A +c n _iA 



ciA + c . 



That is, the leading coefficient is 1. In what follows, the coefficients of polynomials are in F, a field 
of scalars which is completely arbitrary. Think R if you need an example. 

Definition 16.3.4 A polynomial f is said to divide a polynomial g if g (A) = / (A) r (A) for some 
polynomial r (A). Let {<p i (A)} be a finite set of polynomials. The greatest common divisor will be 
the monic polynomial q such that q (A) divides each <\> i (A) and if p (A) divides each <\> i (A) , then 
p(X) divides q (A) . The finite set of polynomials {^} is said to be relatively prime if their greatest 
common divisor is 1. A polynomial f (A) is irreducible if there is no polynomial with coefficients in 
¥ which divides it except nonzero scalar multiples of f (A) and constants. 

Proposition 16.3.5 The greatest common divisor is unique. 
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Proof: Suppose both q (A) and q' (A) work. Then q (A) divides q' (A) and the other way around 
and so 

q'(\) = q(\)l(\),q(\) = l'(\)q'(\) 

Therefore, the two must have the same degree. Hence V (A) , I (A) are both constants. However, this 
constant must be 1 because both q (A) and q' (A) are monic. ■ 

Theorem 16.3.6 Let ip (A) be the greatest common divisor of {(j) i (A)} , not all of which are zero 
polynomials. Then there exist polynomials Ti (A) such that 



V>(A) = 5>(A)<^(A). 



Furthermore, ip (A) is the monic polynomial of smallest degree which can be written in the above 
form. 

Proof: Let S denote the set of monic polynomials which are of the form 

i=l 

where r^ (A) is a polynomial. Then S ^ because some <p i (A) ^ 0. Then let the n be chosen such 
that the degree of the expression Y^l=i r i W $% W * s as sma U as possible. Letting ip (A) equal this 
sum, it remains to verify it is the greatest common divisor. First, does it divide each <p i (A)? Suppose 
it fails to divide <p x (A) . Then by Lemma 16.3.3 

^(A) = V(A)Z(A) + r(A) 

where degree of r (A) is less than that of ip(\). Then dividing r (A) by the leading coefficient if 
necessary and denoting the result by ip 1 (A) , it follows the degree of ip x (A) is less than the degree 
of ip (A) and ip 1 (A) equals 

^i(A) = (^(A)-V(A)l(A)) a 

= U 1 (A)-^r i (A)<A i (A)Z(A)ja 

= Ul - n (A)) X (A) + J2 (~n ( A ) l (A)) 4>i (A) J a 

for a suitable a E F. This is one of the polynomials in S. Therefore, ip (A) does not have the smallest 
degree after all because the degree of ip 1 (A) is smaller. This is a contradiction. Therefore, i(j (A) 
divides <p 1 (A) . Similarly it divides all the other <p i (A). 

If p (A) divides all the (j) i (A) , then it divides ip (A) because of the formula for ip (A) which equals 
ELi^(A)^(A).« 

Lemma 16.3.7 Suppose <p (A) and ip (A) are monic polynomials which are irreducible and not equal. 
Then they are relatively prime. 

Proof 1: Suppose 77(A) is a nonconstant polynomial. If 77(A) divides 0(A) , then since 0(A) is 
irreducible, 77 (A) equals a<p (A) for some a G F. If 77 (A) divides ip (A) then it must be of the form 
bip (A) for some b G F and so it follows 

V(A) = ^(A) 
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but both ijj (A) and </> (A) are monic polynomials which implies a = b and so if; (A) = 0(A). This 
is assumed not to happen. It follows the only polynomials which divide both ip (A) and <j) (A) are 
constants and so the two polynomials are relatively prime. Thus a polynomial which divides them 
both must be a constant, and if it is monic, then it must be 1. Thus 1 is the greatest common 
divisor. ■ 

Lemma 16.3.8 Let ip (A) be an irreducible monic polynomial not equal to 1 which divides 

p 
TT (j) i (A) l , ki a positive integer, 

where each <p i (A) is an irreducible monic polynomial. Then ip (A) equals some <p i (A) . 

Proof : Suppose ijj (A) ^ <\> i (A) for all i. Then by Lemma 16.3.7, there exist polynomials 
vrti (A) , Hi (A) such that 

l = ^(A)m,(A) + z (A)n z (A). 

Hence 

Then, letting g (A) = nf=i n i W * •> an d applying the binomial theorem, there exists a polynomial 
h (A) such that 



i=l i=l 

V 

JJ(l-^(A)mi(A)) fc< = l+^(A)/i(A) 



z=l i=l 

P 



i=l 

Thus, using the fact that ip (A) divides Yi^=i 0i W \ for a suitable polynomial g (A) , 

5 (A)V(A) = l + V(A)/i(A) 
l = ^(A)(/i(A)- 5 (A)) 

which is impossible if ip (A) is non constant, as assumed. ■ 

Now here is a simple lemma about canceling monic polynomials. 

Lemma 16.3.9 Suppose p (A) is a monic polynomial and q (A) is a polynomial such that 

p(X)q(X)=0. 



Then q (A) = 0. Also if 

then qi (A) = q 2 (A) . 
Proof: Let 



Then the product equals 



P(A)«i(A)=p(A)ga(A) 



k n 

p(X) = ^PjX , q(X) =$^ ft A*, p fc = 1. 
j=i *=i 

fc n 

j=l 2=1 
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Then look at those terms involving A fe+n . This is PkQn^ k+n and is given to be 0. Since pk = 1, it 
follows q n = 0. Thus 

k n—1 

3=1 i=l 

Then consider the term involving \ n ~ 1+k an <i conclude that since pk = 1, it follows g n _i = 0. 
Continuing this way, each ^ = 0. This proves the first part. The second follows from 

p(\)(qi(X)-q 2 (X))=0.m 

As a simple application, one can prove uniqueness of q (A) and r (A) in Lemma 16.3.3. Suppose 
q (A) , r (A) also work in the conclusion of this lemma. Then 

q(\)g(\) + r(\)=q(\)g(\)+?(\) 

and so 

(q(X)-q(X))g(X)=r(X)-r(X) 

If r (A) 7^ r (A) , then the degree of the right is less than the degree of the left which is impossible. 
Thus r (A) = r (A). Hence 

(q(X)-q(X))g(X)=0 

Therefore, q (A) = q(X) . 

The following is the analog of the fundamental theorem of arithmetic for polynomials. 

Theorem 16.3.10 Let f (A) be a nonconstant polynomial with coefficients in ¥. Then there is some 
a G ¥ such that f (A) = a fllLi $% W where <j) i (A) is an irreducible nonconstant monic polynomial 
and repeats are allowed. Furthermore, this factorization is unique in the sense that any two of these 
factorizations have the same nonconstant factors in the product, possibly in different order and the 
same constant a. 

Proof: That such a factorization exists is obvious. If / (A) is irreducible, you are done. Factor 
out the leading coefficient. If not, then / (A) = acj) 1 (A) (f) 2 (A) where these are monic polynomials. 
Continue doing this with the <\> i and eventually arrive at a factorization of the desired form. 

It remains to argue the factorization is unique except for order of the factors. Suppose 



'ii^( A )= 6 ii^( A ) 



where the <p i (A) and the ip i (A) are all irreducible monic nonconstant polynomials and a, b G F. If 
n > m, then by Lemma 16.3.8, each ip i (A) equals one of the <j) ■ (A) . By the above cancellation lemma, 
Lemma 16.3.9, you can cancel all these ip i (A) with appropriate ■ (A) and obtain a contradiction 
because the resulting polynomials on either side would have different degrees. Similarly, it cannot 
happen that n < m. It follows n = m and the two products consist of the same polynomials. Then 
it follows a = b. ■ 

The following corollary will be well used. This corollary seems rather believable but does require 
a proof. 

Corollary 16.3.11 Let q(X) — nf=i $% W % where the ki are positive integers and the (j) i (A) are 
irreducible monic polynomials. Suppose also that p (A) is a monic polynomial which divides q (A) . 
Then 

p(X) = f[^W n 

i=l 

where r\ is a nonnegative integer no larger than ki. 
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Proof: Using Theorem 16.3.10, let p (A) = &n|=i ^i W^ wnere the i/j i (A) are each irreducible 
and monic and b G F. Since p (A) is monic, 6=1. Then there exists a polynomial g (A) such that 

p(\)g(\)=g(\)f[i> i (\P=f[ct> i (\) ki 

2=1 2=1 

Hence g (A) must be monic. Therefore, 



pW 



i 



p 



p (A) g (A) = n ^ (A)^ n ^ (A) = n & (a) 

2 = 1 J = l * = 1 



fc* 



for r/j monic and irreducible. By uniqueness, each ip i equals one of the ■ (A) and the same holding 
true of the r] i (A). Therefore, p (A) is of the desired form. ■ 

Is there a way to compute the greatest common divisor of two polynomials? Let m (A) , n (A) be 
two polynomials, not equal since otherwise there is nothing to show. Then by Lemma 16.3.3, there 
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exist unique q (A) , mi (A) where mi (A) either equals or has degree less than the degree of m (A) 
such that 

n(X)=m (A) q (A) + m 1 (A) , m x (A) ^ n (A) . 

(q (A) will refer to a generic polynomial in what follows.) Then if I (A) divides both n (A) and m (A) , 
then I (A) must also divide mi (A) . Hence in determining the greatest common divisor, one can 
consider the two polynomials mi (A) , n (A) instead. Then write 

mi (A) = n (A) q (A) + ni (A) 

where the degree of n\ (A) is less than the degree of n (A) or else equal to 0. Continuing this way, 
one obtains a sequence of polynomials (rrii (A) , rii (A)) which have the same greatest common divisor 
as the original n (A) and m (A) but such that the sum of the degrees of rrii (A) and rii (A) is a strictly 
decreasing sequence. From the construction just described, it must be the case that eventually some 
rrii (A) or n^ (A) is a constant. Say this happens when 

m k (A) = n fe _i (A) q (A) + n k 

where n k is a constant. If it is then the greatest common divisor is just n^-i (A) normalized to 
make the leading coefficient equal to 1. If it is not zero, then the greatest common divisor is 1. 

Example 16.3.12 Find the greatest common divisor of x 2 + 2x + 1 and x 3 + Ax 2 + 5x + 2. 

By the Euclidean algorithm for polynomials 



x 



4x 2 + bx + 2 = (x 2 + 2x + 1) (x + 2) + 



Hence the greatest common divisor is x 2 + 2x + 1. 

Example 16.3.13 Find £/ie greatest common divisor of x 2 + 3x + 2 and x 3 + 3x 2 — x — 3. 
By the Euclidean algorithm for polynomials, 

x 3 + 3x 2 - x - 3 = (x 2 + 3x + 2) x + (-3a; - 3) 
Thus you can consider the two polynomials {x 3 + 4x 2 + 5x + 2, — 3x — 3) . Now divide again. 

£ 3 +4r 2 + 5£ + 2= ( — x 2 - -x + 1 J (-3x-3) + 

and so it follows that the greatest common divisor is x + 1. 

Example 16.3.14 Find the greatest common divisor of x 2 + 3x + 2 and x 2 -\- 2x — 3. 
By Euclidean algorithm for polynomials, 

x 2 + 3x + 2 = x 2 + 2x - 3 + - 1) 
Now you can consider x 2 + 3x + 2 and x — 1. Do another division. 

x 2 + 3x + 2 = + 4) - 1) + 6 

Then it follows that the greatest common divisor is 1. Thus the two original polynomials are 
relatively prime. 
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16.3.2 Polynomials And Fields 

When you have a polynomial like x 2 — 3 which has no rational roots, it turns out you can enlarge 
the field of rational numbers to obtain a larger field such that this polynomial does have roots in 
this larger field. I am going to discuss a systematic way to do this. It will turn out that for any 
polynomial with coefficients in any field, there always exists a possibly larger field such that the 
polynomial has roots in this larger field. This book has mainly featured the field of real or complex 
numbers but this procedure will show how to obtain many other fields which could be used in most of 
what was presented earlier in the book. Here is an important idea concerning equivalence relations. 

Definition 16.3.15 Let S be a set. The symbol, ~ is called an equivalence relation on S if it 
satisfies the following axioms. 

1. x ~ x for all x G S. (Reflexive) 

2. If x ~ y then y ~ x. (Symmetric) 

3. If x ~ y and y ~ z, then x ~ z. (Transitive) 

Definition 16.3.16 [x] denotes the set of all elements of S which are equivalent to x and [x] is 
called the equivalence class determined by x or just the equivalence class of x. 

Also recall the notion of equivalence classes. 

Theorem 16.3.17 Let ~ be an equivalence class defined on a set, S and let % denote the set of 
equivalence classes. Then if [x] and [y] are two of these equivalence classes, either x ~ y and [x] = [y] 
or it is not true that x ~ y and [x] fl [y] =0. 

Definition 16.3.18 Let ¥ be a field, for example the rational numbers, and denote by ¥ [x] the 
polynomials having coefficients in ¥. Suppose p(x) is a polynomial. Let a(x) ~ b(x) (a(x) is 
similar to b (x)) when 

a(x) — b (x) = k (x) p (x) 

for some polynomial k (x) . 

Proposition 16.3.19 In the above definition, ~ is an equivalence relation. 

Proof: First of all, note that a(x) ~ a (x) because their difference equals Op (x) . If a (x) ~ b (x) , 
then a (x)— b (x) = k (x)p(x) for some k (x) . But then b(x)—a (x) = —k (x)p(x) and so b(x) ~ a(x). 
Next suppose a (x) ~ b (x) and b (x) ~ c (x) . Then a (x) — b (x) = k (x) p (x) for some polynomial 
k (x) and also b (x) — c (x) = I (x) p (x) for some polynomial I (x) . Then 

a (x) — c(x) = a (x) — b (x) + b (x) — c (x) 

= k (x) p(x) + / (x) p (x) = (I (x) + k (x)) p (x) 

and so a(x) ~ c (x) and this shows the transitive law. ■ 

With this proposition, here is another definition which essentially describes the elements of the 
new field. It will eventually be necessary to assume the polynomial p (x) in the above definition is 
irreducible so I will begin assuming this. 

Definition 16.3.20 Let ¥ be a field and let p{x) G F [x] be irreducible. This means there is no 
polynomial which divides p (x) except for itself and constants. For the similarity relation defined in 
Definition 16.3.18, define the following operations on the equivalence classes, [a (x)\ is an equivalence 
class means that it is the set of all polynomials which are similar to a (x) . 

[a 0)] + [b (x)} = [a 0) + b (x)} 
[a(x)][b(x)] = [a(x)b{x)\ 

This collection of equivalence classes is sometimes denoted by ¥ [x] / (p(x)). 
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Proposition 16.3.21 In the situation of Definition 16.3.20, p(x) andq(x) are relatively prime for 
any q(x) G F [x] which is not a multiple ofp(x). Also the definitions of addition and multiplication 
are well defined. In addition, if a,b G F and [a] = [b] , then a = b. 

Proof: First consider the claim about p(x) ,q (x) being relatively prime. If i/j (x) is the greatest 
common divisor, it follows ip (x) is either equal to p (x) or 1. If it is p (x) , then q (x) is a multiple of 
p (x) . If it is 1, then by definition, the two polynomials are relatively prime. 

To show the operations are well defined, suppose 

[a(x)] = [a'(x)],[b(x)] = [b'(x)} 

It is necessary to show 

[a (x) + b (x)] = [a' (x) + b' (x)] 

[a(x)b(x)} = [a'(x)b'(x)] 
Consider the second of the two. 

a' (x) b' (x) — a (x) b (x) 
= a' (x) b' (x) — a (x) b' (x) + a (x) b' (x) — a (x) b (x) 
= b' (x) (a' (x) - a (x)) + a (x) (b' (x) - b (x)) 

Now by assumption (a' (x) — a(x)) is a multiple of p(x) as is (b' (x) — b(x)) , so the above is a 
multiple of p(x) and by definition this shows [a (x) b(x)] = [a' (x) b' [x)\. The case for addition is 
similar. 

Now suppose [a] = [b] . This means a — b = k(x)p (x) for some polynomial k (x) . Then k (x) 
must equal since otherwise the two polynomials a — b and k (x) p (x) could not be equal because 
they would have different degree. ■ 

Note that from this proposition and math induction, if each a^ G F, 

[a n x n + a n _ix n_1 H h a\x + a ] 

= [a n ] [x] n + [a n _i] [x] 71 ' 1 -\ [ai] [x] + [a ] (16.7) 
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With the above preparation, here is a definition of a field in which the irreducible polynomial 

p(x) has a root. 

Definition 16.3.22 Let p(x) G F [x] be irreducible and let a{x) ~ b(x) when a(x) — b(x) is a 
multiple of p{x) . Let G denote the set of equivalence classes as described above with the operations 
also described in Definition 16.3.20. 

Also here is another useful definition and a simple proposition which comes from it. 

Definition 16.3.23 Let F C K be two fields. Then clearly K is also a vector space over F. Then 
also, K is called a finite field extension of F if the dimension of this vector space, denoted by [K : F] 
is finite. 

There are some easy things to observe about this. 

Proposition 16.3.24 Let F C K C L be fields. Then [L : F] = [L : K] [K : F}. 

Proof: Let {h}™ =1 be a basis for L over K and let {kj} m ^ 1 be a basis of K over F. Then if 
/ G L, there exist unique scalars Xi in K such that 
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Now Xi G K so there exist fji such that 

m 
Xi = / j Jjikj 

Then it follows that 

n m 

It follows that {/cj/i} is a spanning set. If 

n m 

Then, since the k are independent, it follows that 

m 

i=i 

and since {/c^} is independent, each /^ = for each j for a given arbitrary i. Therefore, {kjli} is a 
basis. ■ 

Theorem 16.3.25 The set of all equivalence classes G = ¥/ (p(x)) described above with the mul- 
tiplicative identity given by [1] and the additive identity given by [0] along with the operations of 
Definition 16.3.20, is a field and p([x\) = [0] . (Thus p has a root in this new field.) In addition to 
this, [G : F] = n, the degree of p (x) . 

Proof: Everything is obvious except for the existence of the multiplicative inverse and the 
assertion that £>([#]) = 0. Suppose then that [a (x)] ^ [0] . That is, a(x) is not a multiple of p{x). 
Why does [a(x)] _ exist? By Theorem 16.3.6, a(x),p(x) are relatively prime and so there exist 
polynomials ip (x) , <p (x) such that 

1 = ip (x) p (x) + a (x) (j) (x) 

and so 

1 — a (x) (j) (x) = ijj (x) p (x) 

which, by definition implies 

[1 - a (x) 4> (x)] = [1] - [a (x) 4> (x)] = [1] - [a (x)] [</> (x)} = [0] 

and so [(j) (x)] = [a (x)]~ . This shows G is a field. 

Now if p (x) = a n x n + a n -\x n ~ x + • • • + a\x + ao, p ([x]) = by 16.7 and the definition which 
says \p(x)] = [0]. 

Consider the claim about the dimension. It was just shown that [1] , [x] , [x 2 ] , • • • , [x n ] is linearly 
dependent. Also [1] , [x] , [x 2 ] , • • • , [x n_1 ] is independent because if not, there would exist a poly- 
nomial q (x) of degree n — 1 which is a multiple of p (x) which is impossible. Now for [q (x)] G G, 
you can write 

q (x) = p (x) I (x) + r (x) 

where the degree of r (x) is less than n or else it equals 0. Either way, [q (x)] = [r (x)] which is a 
linear combination of [1] , [x] , [x 2 ] , • • • , [x n_1 ] . Thus [G : F] = n as claimed. ■ 

Note that if p (x) were not irreducible, then you could find a field extension G such that [G : F] < 
n. You could do this by working with an irreducible factor of p{x). 
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Usually, people simply write b rather than [b] if b G F. Then with this convention, 

[b<t>{x)] = [b}{<t>{x)]=b{<i>{x)]. 

This shows how to enlarge a field to get a new one in which the polynomial has a root. By using 
a succession of such enlargements, called field extensions, there will exist a field in which the given 
polynomial can be factored into a product of polynomials having degree one. The field you obtain 
in this process of enlarging in which the given polynomial factors in terms of linear factors is called 
a splitting field. 

Theorem 16.3.26 Let p (x) = x n + a n -\x n ~ x + • • • + a\x + a^ be a polynomial with coefficients in 
a field of scalars F. There exists a larger field G such that there exist {zi, • • • , z n } listed according 
to multiplicity such that 

n 

p( x ) = Y[( x ~ z i) 

2=1 

This larger field is called the splitting field. Furthermore, 

[G : F] < n\ 

Proof: From Theorem 16.3.25, there exists a field Fi such that p(x) has a root, Z\ (= [x] if p is 
irreducible.) Then by the Euclidean algorithm 

p (x) = (x - z\) qi (x) + r 

where r G Fi. Since p(zf) = 0, this requires r = 0. Now do the same for q\ (x) that was done for 
p (x) , enlarging the field to F2 if necessary, such that in this new field 

q 1 (x) = (x - z 2 )q2 (x). 

and so 

p (x) = - zi) (x - z 2 ) (72 (x) 

After n such extensions, you will have obtained the necessary field G. 

Finally consider the claim about dimension. Then, by Theorem 16.3.25, there is a larger field 
Gi such that p (x) has a root a\ in Gi and [G : F] < n. Then 

p (x) = [x — af) q (x) 

Continue this way until the polynomial equals the product of linear factors. Then by Proposition 
16.3.24 applied multiple times, [G : F] <n\. ■ 

Example 16.3.27 The polynomial x 2 + l is irreducible in R (x) , polynomials having real coefficients. 
To see this is the case, suppose ip (x) divides x 2 + 1. Then 

x 2 + 1 = i/j (x) q (x) 

If the degree of ip (x) is less than 2, then it must be either a constant or of the form ax + b. In the 
latter case, —b/a must be a zero of the right side, hence of the left but x 2 + 1 has no real zeros. 
Therefore, the degree of ijj (x) must be two and q (x) must be a constant. Thus the only polynomial 
which divides x 2 + 1 are constants and multiples ofx 2 + l. Therefore, this shows x 2 + 1 is irreducible. 
Find the inverse of [x 2 + x + l] in the space of equivalence classes, R/ (x 2 + l) . 

You can solve this with partial fractions. 

1 x x + 1 



O 2 + 1) O 2 + £ + 1) 
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and so 

1 = (-x) (x 2 + x + 1) + (x + 1) (x 2 + 1) 

which implies 

1 ~ (-x) (x 2 +X + 1) 

and so the inverse is [—x\. 

The following proposition is interesting. It was essentially proved above but to emphasize it, 
here it is again. 

Proposition 16.3.28 Suppose p(x) G F [x] is irreducible and has degree n. Then every element of 
G = F [x] J (p (x)) is of the form [0] or [r (x)] where the degree of r (x) is less than n. 

Proof: This follows right away from the Euclidean algorithm for polynomials. If k (x) has degree 
larger than n — 1, then 

k{x) — q (x) p(x) -\-r (x) 

where r (x) is either equal to or has degree less than n. Hence 

[k(x)] = [r(x)].M 

Example 16.3.29 In the situation of the above example, find [ax + b]~ assuming a 2 -\-b 2 ^ 0. Note 
this includes all cases of interest thanks to the above proposition. 

You can do it with partial fractions as above. 

1 b — ax a 2 



and so 



Thus 



(x 2 + 1) (ax + 6) (a 2 + 6 2 )(x 2 + l) (a 2 + b 2 ) (ax + b) 
l=^ 2 (b-ax)(ax + b) + J ^^ ) (x 2 + l) 
(b — ax) (ax + b) ~ 1 



a 2 + 6 2 
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and so 



\ax - 



bY 



[(b — ax)] b — a [x] 



b 2 



You might find it interesting to recall that (ai + b)~ 



a 2 + b 

b—ai 
a 2 +b 2 



16.3.3 The Algebraic Numbers 

Each polynomial having coefficients in a field F has a splitting field. Consider the case of all 
polynomials p (x) having coefficients in a field FCG and will look at all roots which are also in G. 
The theory of vector spaces is very useful in the study of these algebraic numbers. 

Definition 16.3.30 The algebraic numbers A are those numbers which are in G and also roots of 
some polynomial p (x) having coefficients in ¥. Here is a definition. 

Theorem 16.3.31 Let a G A. Then there exists a unique monic irreducible polynomial p (x) having 
coefficients in ¥ such that p (a) = 0. This is called the minimal polynomial for a. 

Proof: By definition, there exists a polynomial q (x) having coefficients in F such that q (a) = 0. 
If q (x) is irreducible, divide by the leading coefficient and this proves the existence. If q (x) is not 




encsson. 
com 



Shaping tomorrow's world - today 

Our business is at the heart of a connected world - a world 
where communication is empowering people, business and 
society. Our networks, telecom services and multimedia 
solutions are shaping tomorrow. And this might just be your 
chance to shape your own future. 

It's a people thing 

We are looking for high-caliber people who can see the 
opportunities, people who can bring knowledge, energy and vision 
to our organization. In return we offer the chance to work with 
cutting-edge technology, personal and professional development, 
and the opportunity to make a difference in a truly global company. 

We are currently recruiting both new graduates and experienced 
professionals in four areas: Software, Hardware, Systems and 
Integration & Verification. 

Are you ready to shape your future? Begin by exploring a career 
with Ericsson. Visit www.ericsson.com/join-ericsson 



m 



Download free eBooks at bookboon.com 



441 



Click on the ad to read more 



Elementary Linear Algebra Vector Spaces 



irreducible, then there exist nonconstant polynomials r (x) and k (x) such that q(x) = r(x)k(x). 
Then one of r (a) ,k (a) equals 0. Pick the one which equals zero and let it play the role of q (x). 
Continuing this way, in finitely many steps one obtains an irreducible polynomial p (x) such that 
p (a) = 0. Now divide by the leading coefficient and this proves existence. Suppose p^i — 1,2 both 
work and they are not equal. Then by Lemma 16.3.7 they must be relatively prime because they 
are both assumed to be irreducible and so there exist polynomials l{x) ,k (x) such that 

1 = l(x) p 1 (x) + k (x) p 2 (x) 

But now when a is substituted for x, this yields = 1, a contradiction. The polynomials are equal 
after all. ■ 

Definition 16.3.32 For a an algebraic number, let deg (a) denote the degree of the minimal poly- 
nomial of a. 

Also, here is another definition. 

Definition 16.3.33 Let ai, • • • , a m be in A. A polynomial in {ai, • • • , a m } will be an expression of 
the form 

J2 a kl ... kn a k 1 1 ---a k n - 

where the a/ Cl .../ Cn are in F ; each kj is a nonnegative integer, and all but finitely many of the a^...^ 
equal zero. The collection of such polynomials will be denoted by 

F[ai,--- ,a m ]. 

Now notice that for a an algebraic number, F [a] is a vector space with field of scalars F. Similarly, 
for {ai, • • • ,a m } algebraic numbers, F [ai, • • • , a m ] is a vector space with field of scalars F. The 
following fundamental proposition is important. 

Proposition 16.3.34 Let {ai,--- ,a m } be algebraic numbers. Then 

m 

dimF[ai,--- ,a m ] < J^[deg(a i ) 
i=i 

and for an algebraic number a, 

dimF [a] = deg (a) 

Every element ofW [ai, • • • , a m ] is in A and F [ai, • • • , a m ] is a field. 

Proof: First consider the second assertion. Let the minimal polynomial of a be 

p (x) = x n + a n -\x n ~ x + • • • + gl\x + ao- 

Since p (a) = 0, it follows {l, a, a 2 , • • • , a n } is linearly dependent. However, if the degree of q (x) is 
less than the degree of p (x) , then if q (x) is not a constant, the two must be relatively prime because 
p (x) is irreducible and so there exist polynomials k (x) , / (x) such that 

1 = I (x) q (x) -\-k{x)p (x) 

and this is a contradiction if q (a) = because it would imply upon replacing x with a that 1 = 0. 
Therefore, no polynomial having degree less than n can have a as a root. It follows 

n-ll 



{l.a.aV.-.a"- 1 } 
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is linearly independent. Thus dimF [a] = deg (a) = n. Here is why this is. If q (a) is any element of 
F [a] , 

q (x) = p (x) k (x) + r (x) 

where degr (x) < degp(x) and so q(a) = r (a) and r (a) G span (l, a, a 2 , • • • , a n_1 ). 

Now consider the first claim. By definition, F [ai, • • ■ , a m ] is obtained from all linear combinations 

of < a 1 1 ,a 2 2 , • • • , a^ 71 f where the fc^ are nonnegative integers. From the first part, it suffices to 

consider only kj < deg (dj). Therefore, there exists a spanning set for F [a±, • • • , a m ] which has 



JJdeg(ai) 



entries. By Theorem 16.2.4 this proves the first claim. 

Finally consider the last claim. Let g (ai, • • • , a m ) be a polynomial in {ai, • • • , a m } in F [ai, • • • , a m ]. 
Since 

m 

dimF[ai,--- ,a m ] =p < JJdeg(aj) < oo, 

i=i 
it follows 

l,#(ai,--- ,a m ),^(ai,--- ,a m ) 2 ,--- ,g(a lr -- ,a m ) p 

are dependent. It follows g(ai,--- , a m ) is the root of some polynomial having coefficients in F. 
Thus everything in F [ai, • • • , a m ] is algebraic. Why is F [ai, • • • , a m ] a field? Let g (ai, • • • , a m ) be 
as just mentioned. Then it has a minimal polynomial, 

p (x) = x p + a p _ix p_1 + • • • + a\x + ao 
where the a^ G F. Then ao 7^ or else the polynomial would not be minimal. Therefore, 

g (ai, • • • , a m ) U (ai, • • • , a m ) p_1 + a p _i^ (a x , • • • , a m ) p ~ 2 -\ \- aA = -a 

and so the multiplicative inverse for g (ai, • • • , a m ) is 

^(ai,--- ,a m ) p " +a p _i^(ai,--- ,a m ) p " H h a x 

t ir* [ai, • • • , a m . 

-a 

The other axioms of a field are obvious. ■ 

Now from this proposition, it is easy to obtain the following interesting result about the algebraic 
numbers. 

Theorem 16.3.35 The algebraic numbers A, those roots of polynomials in F [x] which are in G, 
are a field. 

Proof: Let a be an algebraic number and let p (x) be its minimal polynomial. Then p (x) is of 
the form 

x n + a n -\x n ~ x H h a\x + ao 

where ao 7^ 0. Then plugging in a yields 

(a 72 " 1 + a n _ia n - 2 + • • • + a ± ) (-1) 



1. 



a 



and so a -1 = ^ — G F [a]. By the proposition, every element of F [a] is in A 

and this shows that for every element of A, its inverse is also in A. What about products and sums 
of things in A? Are they still in A? Yes. If a, b G A, then both a + b and ab G F [a, b] and from the 
proposition, each element of F [a, b] is in A. ■ 

A typical example of what is of interest here is when the field F of scalars is Q, the rational 
numbers and the field G is R or C. However, you can certainly conceive of many other examples 
by considering the integers mod a prime, for example (See Problems 1 - Problem 4 on Page 420 for 
example.) or any of the fields which occur as field extensions in the above. 
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16.3.4 The Lindemann Weierstrass Theorem And Vector Spaces 

As another application of the abstract concept of vector spaces, there is an amazing theorem due 
to Weierstrass and Lindemann. There is a proof of this theorem in [9] . It is also in an appendix of 
Linear Algebra. 

Theorem 16.3.36 Suppose a\,--- ,a n are algebraic numbers and suppose ai, — - ,a n are distinct 
braic numbers. Then 



J2 a i eai 7^° 



In other words, the {e ai ,--- ,e an } are independent as vectors with field of scalars equal to the 
braic numbers. 



A number is transcendental if it is not a root of a polynomial which has integer coefficients. Most 
numbers are this way but it is hard to verify that specific numbers are transcendental. That tt is 
transcendental follows from 

e° + e i7r = 0. 

By the above theorem, this could not happen if tt were algebraic because then in would also be 
algebraic. Recall these algebraic numbers form a field and i is clearly algebraic, being a root of 
x 2 + 1. This fact about tt was first proved by Lindemann in 1882 and then the general theorem 
above was proved by Weierstrass in 1885. This fact that tt is transcendental solved an old problem 
called squaring the circle which was to construct a square with the same area as a circle using a 
straight edge and compass. It can be shown that the fact tt is transcendental implies this problem 
is impossible. 1 

16.4 Exercises 

1. Let M = {u = (^1,^2,^3,^4) G M 4 : |iii| < 4} . Is M a subspace? Explain. 



1 Gilbert, the librettist of the Savoy operas, may have heard about this great achievement. In Princess Ida which 
opened in 1884 he has the following lines. "As for fashion they forswear it, so the say - so they say; and the circle - 
they will square it some fine day some fine day." Of course it had been proved impossible to do this a couple of years 
before. 
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2. Let M = {u = (ui, U2, ^3, 114) G M 4 : sin (^i) = l} . Is M a subspace? Explain. 

3. If you have 5 vectors in F 5 and the vectors are linearly independent, can it always be concluded 
they span F 5 ? Here F is an arbitrary field. Explain. 

4. If you have 6 vectors in F 5 , is it possible they are linearly independent? Here F is an arbitrary 
field. Explain. 

5. Show in any vector space, is unique. 

6. tl n an y vector space, show that if x + y = 0, then y = — x. 

7. tShow that in any vector space, Ox = 0. That is, the scalar times the vector x gives the 
vector 0. 

8. tShow that in any vector space, (— 1) x = — x. 

9. Let X be a vector space and suppose {xi, • • • , x^} is a set of vectors from X. Show that is 
in span(xi,--- ,x fe ) . 
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10. Let X consist of the real valued functions which are defined on an interval [a, b] . For /, g E 
X, f + g is the name of the function which satisfies (/ + g) (x) = / (x) + g (x) and for a a 
real number, (af) (x) = a (f (x)). Show this is a vector space with field of scalars equal to R. 
Also explain why it cannot possibly be finite dimensional. 

11. Let Sbea nonempty set and let V denote the set of all functions which are defined on S and 
have values in W a vector space having field of scalars F. Also define vector addition according 
to the usual rule, (/ + g) (s) = f (s) + g (s) and scalar multiplication by (af) (s) = af(s). 
Show that V is a vector space with field of scalars F. 

12. Verify that any field F is a vector space with field of scalars F. However, show that R is a 
vector space with field of scalars Q. 

13. Let F be a field and consider functions defined on {1,2, • • • , n} having values in F. Explain 
how, if V is the set of all such functions, V can be considered as F n . 

14. Let V be the set of all functions defined on N = {1, 2, • • • } having values in a field F such 
that vector addition and scalar multiplication are defined by (f + g) (s) = f (s) + g (s) and 
(af ) (s) = af (s) respectively, for f , g E V and a E F. Explain how this is a vector space and 
show that for e^ given by 

ei w - \ if i + k ' 
the vectors {ek}^ =1 are linearly independent. 

15. Suppose, in the context of Problem 10 you have smooth functions {2/1,2/2, * * * , Vn} (all deriva- 
tives exist) defined on an interval [a, b] . Then the Wronskian of these functions is the deter- 
minant 

/ 2/1 0*0 ' • ' Vn (x) \ 

y'i ( x ) ■■- y'n 0) 

W(yi,-" ,Vn){x) = det . . 

Ui n " 1} W ••• y { n- 1] (x) ) 

Show that if W (2/1, • • • , y n ) (x) ^ for some x, then the functions are linearly independent. 

16. Give an example of two functions, 2/1,2/2 defined on [—1, 1] such that 

W( yi ,y 2 )(x)=0 
for all x E [—1, 1] and yet {2/1,2/2} is linearly independent. 

17. Let the vectors be polynomials of degree no more than 3. Show that with the usual definitions 
of scalar multiplication and addition wherein, for p (x) a polynomial, (ap) (x) = ap (x) and for 
p, q polynomials (p + q) (x) = p (x) + q (x) , this is a vector space. 

18. In the previous problem show that a basis for the vector space is {l,x,x 2 ,x 3 } . 

19. Let V be the polynomials of degree no more than 3. Determine which of the following are 
bases for this vector space. 

(a) {x + 1, x 3 + x 2 + 2x, x 2 + x, x 3 + x 2 + x} 

(b) {x 3 + 1, x 2 + x, 2x 3 + x 2 ,2x 3 - x 2 - 3x + 1} 

20. In the context of the above problem, consider polynomials 

{ciiX 3 + biX 2 + ax + di, i = 1, 2, 3, 4} 
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Show that this collection of polynomials is linearly independent on an interval [a, b] if and only 
if 

/ a\ b\ c\ d\ \ 

a 2 b 2 c 2 d 2 

CL3 b 3 c 3 ds 

\ CL4 64 C4 d^ J 

is an invertible matrix. 

21. Let the field of scalars be Q, the rational numbers and let the vectors be of the form a + b\/2 
where a, b are rational numbers. Show that this collection of vectors is a vector space with field 
of scalars Q and give a basis for this vector space. Suppose V is a finite dimensional vector 
space. Based on the exchange theorem above, it was shown that any two bases have the same 
number of vectors in them. Give a different proof of this fact using the earlier material in the 
book. Hint: Suppose {xi, • • • , x n } and {yi, • • • , y m } are two bases with m < n. Then define 

(j) : F n h-> V, %l> : F m t-> V 

by 

n m 

(j) (a) = ^2 a k x k, ip (b) = ^2 h jyj 

k=l 3=1 

Consider the linear transformation, ip~ o <j). Argue it is a one to one and onto mapping from 
F n to F m . Now consider a matrix of this linear transformation and its row reduced echelon 
form. 

22. Suppose V is a finite dimensional vector space. Based on the exchange theorem above, it was 
shown that any two bases have the same number of vectors in them. Give a different proof of 
this fact using the earlier material in the book. Hint: Suppose {xi, • • • , x n } and {yi, • • • , y m } 
are two bases with m < n. Then define 

<j> : F n ^ V, i/j : F m .-> V 

by 

n m 

(j) (a) = ^2 a k x k, $ (b) = ^2 b jyj 

k=l j=l 

Consider the linear transformation, ij)~ o <fi. Argue it is a one to one and onto mapping from 
F n to F m . Now consider a matrix of this linear transformation and its row reduced echelon 
form. 

23. This and the following problems will present most of a differential equations course. To begin 
with, consider the scalar initial value problem 

y' = ay, y (t ) = y 

When a is real, show the unique solution to this problem is y = yQe a( ^ t ~ to \ Next suppose 

y' = (a + ib) y, y (t ) = y (16.8) 

where y (t) = u (t) + iv (t) . Show there exists a unique solution and it is 

y (t) = yoe ait ~ to) (cos b(t-t Q )+i sin b(t- t )) 

= e (a+i6)(t - to) 7/ . (16.9) 
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Next show that for a real or complex there exists a unique solution to the initial value problem 

y f = ay + /, y (t ) = yo 

and it is given by 

y(t)=e a ^-^yo^e at f e - as f(s)ds. 

J to 

Hint: For the first part write as y' — ay = and multiply both sides by e~ at . Then explain 
why you get 

±(e- at y(t))=0,y(t )=0. 
Now you finish the argument. To show uniqueness in the second part, suppose 

l/ = (a + ib)y, 2/ (0) = 
and verify this requires y (t) = 0. To do this, note 

y' = (a-ib)y, y (0) = 
and that 

" lyWI 2 = y'(t)y(t) + y'(t)y(t) 



dt 

= (a + ib)y(t)y(t) + (a-ib)y(t)y(t) 

= 2a\y(t)\ 2 , |y| 2 (to) = 

Thus from the first part \y (t)\ = 0e~ 2at = 0. Finally observe by a simple computation that 
16.8 is solved by 16.9. For the last part, write the equation as 

y'-ay = f 

and multiply both sides by e~ at and then integrate from to to t using the initial condition. 
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24. ^Now consider A an n x n matrix. By Schur's theorem there exists unitary Q such that 

Q~ r AQ = T 

where T is upper triangular. Now consider the first order initial value problem 

x' = Ax, x(£ ) = x . 

Show there exists a unique solution to this first order system. Hint: Let y = Q _1 x and so 
the system becomes 

y' = Ty, y(i )=Q- 1 x (16.10) 

Now letting y = (yi, • • • , y n ) , the bottom equation becomes 

Vn = tnnyn, Vn Oo) = (Q _1 X ) n . 

Then use the solution you get in this to get the solution to the initial value problem which 
occurs one level up, namely 

Vn-l = £(n-l)(n-l)2/n-l + t(n-l)nVn, Vn-1 (h) = (Q" lx o) n _ 1 

Continue doing this to obtain a unique solution to 16.10. 
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25. ^Now suppose <I> (t) is an n x n matrix of the form 

*(*) = ( Xi (t) ••• Xn(t) ) (16.11) 

where 

4 (*) = Ax fc (t) . 

Explain why 

$' (t) = A<$> (t) 

if and only if <£ (t) is given in the form of 16.11. Also explain why if c E F n , 

y(t) = $(t)c 

solves the equation 

y'(t) = Ay(t). 

26. tin the above problem, consider the question whether all solutions to 

x = Ax (16.12) 

are obtained in the form $ (£) c for some choice of c G F n . In other words, is the general 
solution to this equation <£ (t) c for c G F n ? Prove the following theorem using linear algebra. 

Theorem 16.4.1 Suppose $ (t) is an n x n matrix which satisfies 

$' (t) = A<$> (t) . 

Then the general solution to 16.12 is <£ (t) c if and only if <£ (t)~ exists for some t. Further- 
more, if <J>' (t) = A& (t) , then either <£ (t)~ exists for all t or & (t)~ never exists for any 
t. 

(det (<l> (t)) is called the Wronskian and this theorem is sometimes called the Wronskian alter- 
native.) 

Hint: Suppose first the general solution is of the form <£ (t) c where c is an arbitrary constant 
vector in F n . You need to verify <£ (t)~ exists for some t. In fact, show <£ (t)~ exists for every 
t. Suppose then that <£ (to) - does not exist. Explain why there exists c E F n such that there 
is no solution x to 

c = $(£ )x 

By the existence part of Problem 24 there exists a solution to 

x' = Ax, x(to) = c 

but this cannot be in the form <l> (t) c. Thus for every t, $ (t)~ exists. Next suppose for some 
to, $ (to) ~ exists. Let z' = Az and choose c such that 

z(t ) = $(t )c 

Then both z (t) , $ (t) c solve 

x' = Ax, x(t ) = z(t ) 

Apply uniqueness to conclude z = $ (t) c. Finally, consider that <£ (t) c for c G F n either is the 
general solution or it is not the general solution. If it is, then <£ (t) - exists for all t. If it is 
not, then <£ (t) - cannot exist for any t from what was just shown. 
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27. |Let $' (t) = A§ (t) . Then $ (t) is called a fundamental matrix if $ (t)" 1 exists for all t. Show 
there exists a unique solution to the equation 

x' = Ax + f , x (t ) = x (16.13) 

and it is given by the formula 



x (t) = $ (t) $ (t ) _1 xo + $ (t) / $ (s)" 1 f (5) ds 

-/to 



Now these few problems have done virtually everything of significance in an entire undergrad- 
uate differential equations course, illustrating the superiority of linear algebra. The above 
formula is called the variation of constants formula. 

Hint: Uniqueness is easy. If xi, X2 are two solutions then let u (t) = xi (t) — X2 (t) and argue 
u' = Au, u(to) = 0. Then use Problem 24. To verify there exists a solution, you could just 
differentiate the above formula using the fundamental theorem of calculus and verify it works. 
Another way is to assume the solution in the form 

x(t) = *(t)c(t) 

and find c (t) to make it all work out. This is called the method of variation of parameters. 

28. tShow there exists a special $ such that $' (t) = A$ (t) , $ (0) = J, and $ (t)" 1 exists for all 
t. Show using uniqueness that 

and that for all t, s € M 

* (t + s) = $ (t) $ (5) 

Explain why with this special $, the solution to 16.13 can be written as 

t 



x (£) = $(£ - t ) x + / $ (t - s) f (s) ds 

Jtn 



/t 

Hint: Let $ (t) be such that the j th column is Xj (t) where 

x j =^ x i ? x i(0) = ej. 

Use uniqueness as required. 

29. * Using the Lindemann Weierstrass theorem show that if a is an algebraic number sin <j, cos a, In a, 
and e are all transcendental. Hint: Observe, that 

ee' 1 + (-1) e° = 0, le ln(a) + (-1) ae° = 0, 
-e ia - -e~ i(J + (-1) sin (a) e° = 0. 

16.5 Inner Product Spaces 

16.5.1 Basic Definitions And Examples 

An inner product space V is a vector space which also has an inner product. It is usually assumed, 
when considering inner product spaces that the field of scalars is either F = R or C This terminology 
has already been considered in the context of F n . In this section, it will be assumed that the field 
of scalars is C, the complex numbers, unless specified to be something else. An inner product is a 
mapping (•,•): V x V !->• C which satisfies the following axioms. 
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1. (u,v)eC, (u,v) = (v,u). 

2. If a, b are numbers and u, v, z are vectors then ((au + bv) , z) = a (u, z) + b (v, z) . 

3. (u, u) > and it equals if and only if u = 0. 
Note this implies (x,ay) = a (x, y) because 



(x,ay) = (ay, x) = a (y , x) = a (x, y) 

Example 16.5.1 Let V be the continuous complex valued functions defined on a finite closed interval 
I. Define an inner product as follows. 



'■ s)s l 



(f,9) = / f(x)g(x)p(x)dx 

where p (x) some function which is strictly positive on the closed interval I. It is understood in 
writing this that 



f (x) + ig (x) dx = / / (x) dx + i g (x) dx 

Then with this convention, the usual calculus theorems hold about evaluating integrals using the 
fundamental theorem of calculus and so forth. You simply apply these theorems to the real and 
imaginary parts of a complex valued function. 

Example 16.5.2 Let V be the polynomials of degree at most n which are defined on a closed interval 
I and let {xq, #i, • • • , x n } be n + 1 distinct points in I. Then define 



(f,g) = ^2f(%k)g(xk) 



k=0 

This last example clearly satisfies all the axioms for an inner product except for the one which 
says that (u, u) = if and only if u = 0. Suppose then that (/, /) = 0. Then / must vanish at 
Ti+1 distinct points but / is a polynomial of degree n. Therefore, it has at most n zeros unless it 
is identically equal to 0. Hence the second case holds and so / equals 0. 

Example 16.5.3 Let V be any complex vector space and let {vi, • • • , v n } be a basis. Decree that 

{vi,Vj) =S i:j . 

Then define 



\ ^2 c 7' v ?' 5Z dk ^ k ) - ^2 c i dk ( v i' Vfc ) = 5Z Ckdk 

\j = l k=l I j,k k=l 

This makes the complex vector space into an inner product space. 
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Example 16.5.4 Let V consist of sequences a = {ak}^ =1 , a^GC, with the property that 



^2\ a k\ 



< oc 



k=l 



and the inner product is then defined as 



(a,b) = ^a k b k 



k=l 



All of the axioms of the inner product are obvious for this example except the most basic one 
which says that the inner product has values in C. Why does the above sum even converge? It 
converges from a comparison test. 
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and by assumption, 




and therefore, the given sum which defines the inner product is absolutely convergent. Therefore, 
thanks to completeness of C this sum also converges. This fact should be familiar to anyone who 
has had a calculus class in the context that the sequences are real valued. The case where they are 
complex valued follows right away from a consideration of real and imaginary parts. 

By far the most important example of an inner product space is L 2 (O), the space of Lebesgue 
measurable square integrable functions defined on ft. However, this is a book on algebra, not 
analysis, so this example will be ignored. 

16.5.2 The Cauchy Schwarz Inequality And Norms 

The most fundamental theorem relative to inner products is the Cauchy Schwarz inequality. 

Theorem 16.5.5 (Cauchy Schwarz) The following inequality holds for x and y E V, an inner prod- 
uct space. 

|(x,y)|<(x,x) 1/2 (y,y) 1/2 (16.14) 

Equality holds in this inequality if and only if one vector is a multiple of the other. 

Proof: Let e C such that \0\ = 1 and 

0(x,y) = |(x,y)| 

Consider p (t) = (x + Oty, x + t9y) where fgf. Then from the above list of properties of the dot 
product, 

< p (t) = (x, x) + tO (x, y) + i § (y, x) + t 2 (y, y) 

= <x,x>+t0(x,y>+t0(x,y>+t 2 (y,y) 

= <x,x) + 2rRe0(x,y)+r 2 <y,y) 

= (x,x)+2t|(x,y)|+t 2 (y,y) (16.15) 

and this must hold for all tGR. Therefore, if (y,y) = it must be the case that |(x, y)| =0 also 
since otherwise the above inequality would be violated for large negative t. Therefore, in this case, 

|(x,y)|<(x,x) 1/2 (y,y) 1/2 . 

In the other case, if (y, y) ^ 0, then p (t) > for all t means the graph of y = p (t) is a parabola 
which opens up and it either has exactly one real zero in the case its vertex touches the t axis or it 
has no real zeros. 



-t 




From the quadratic formula this happens exactly when 

4|(x,y)| 2 -4(x,x)(y,y)<0 

which is equivalent to 16.14. 

It is clear from a computation that if one vector is a scalar multiple of the other that equality 
holds in 16.14. Conversely, suppose equality does hold. Then this is equivalent to saying 4 |(x, y)| — 
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4 (x, x) (y, y) = and so from the quadratic formula, there exists one real zero to p (t) = 0. Call it 
to. Then 

p(t ) = (x + <fe o y,x + t o 0y) = |x + ^ y| 2 = 

and so x = —Otoy. ■ 

So what does the Cauchy Schwarz inequality say in the above examples? In Example 16.5.1 it 
says that 

Jf{x)J(xjp(x)dx <U\f(x)\ 2 p(x)dx) U\g(x)\ 2 p(x)dx) 

With the Cauchy Schwarz inequality, it is possible to obtain the triangle inequality. This is the 
inequality in the next theorem. First it is necessary to define the norm or length of a vector. This 
is what is in the next definition. 

Definition 16.5.6 Let V be an inner product space and let z 6 V. Then |z| = (z,z) ' . |z| is called 
the norm of z and also the length ofz. 

With the definition of length of a vector, here are the main properties of length. 

Theorem 16.5.7 For length defined in Definition 16.5.6, the following hold. 

|z| > and |z| = if and only if z = (16.16) 

If a is a scalar, \az\ = \a\ |z| (16.17) 

|z + w| < |z| + |w|. (16.18) 

Proof: The first two claims are left as exercises. To establish the third, 

|z + w| = (z + w, z + w) 

= (z, z) + (w, w) + (w, z) + (z, w) 

= |z| 2 + |w| 2 + 2Re(w,z) 

< |z| 2 + |w| 2 + 2|(w,z)| 

< |z| 2 + |w| 2 + 2|w||z| = (|z| + |w|) 2 .B 

The properties 16.16 - 16.18 are the axioms for a norm. A vector space which has a norm is 
called a normed linear space or a normed vector space. 

16.5.3 The Gram Schmidt Process 

The Gram Schmidt process is also valid in an inner product space. If you have a linearly independent 
set of vectors, there is an orthonormal set of vectors which has the same span. Recall the definition 
of an orthonormal set. It is the same as before. 

Definition 16.5.8 Let V be an inner product space and let {u^} be a collection of vectors. It is an 
orthonormal set if 

(life,!!/) = Sjk. 

As before, every orthonormal set of vectors is linearly independent. If 

n 

^CfeUfe = 
fc=i 

where {u/e}^ =1 is an orthonormal set of vectors, why is each c& = 0? 
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This is true because you can take the inner product of both sides with Uj. Then 

The right side equals because 

(0,u) = (0 + 0,u) = (0,u) + (0,u) 

Subtracting (0, u) from both sides shows that (0, u) = 0. Therefore, from the properties of the inner 
product, 

(n \ n n 

k=l I k=l k=l 

Since Cj was arbitrary, this verifies that an orthonormal set of vectors is linearly independent. 
Now consider the Gram Schmidt process. 

Lemma 16.5.9 Let {xi, • • • , x n } be a linearly independent subset of an inner product space V. Then 
there exists an orthonormal set of vectors {ui, • • • , u n } which has the property that for each k < n, 
span(x lr -- ,x fe ) = span(ui,--- ,u fe ) . 

Proof: Let ui = xi/ |xi| . Thus for k = 1, span(ui) = span(xi) and {ui} is an orthonormal 
set. Now suppose for some k < n, ui, •••, U& have been chosen such that (uj,Uj) = Sji and 
span (xi, • • • , Xfe) = span (ui, • • • , u^). Then define 



x fc+i -Ei=i( x fc+i, u i) u i 



^k 



X fc+1 -Ej=l (x/c+^U^U 



(16.19) 



where the denominator is not equal to zero because the Xj form a basis, and so 

x/c+i $ span (xi, • • • , x fe ) = span (ui, • • • , u k ) 
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Thus by induction, 

Ufc+i e span(ui,--- ,Ufc,Xfc+i) =span(xi,--- ,x fc ,x fc+ i). 
Also, x/e + i G span (ui, • • • , U&, u^+i) which is seen easily by solving 16.19 for x^+i and it follows 

span(xi,--- ,x fe ,x fe+ i) = span (ui , • • • ,u fc ,u fc+ i). 
If I < jfe, 

/ 

(u/fc+l, U/) = C (Xfc+i, Uf) - ^ (Xfc+i, Uj) (Uj,Ui) 

( 

= C (xfc+i, u/) - ^ (xfc+i, Uj) <$ Zj - 

= C ((x fc+ i, uj) - (xfc+i, U|» = 0. 

The vectors, {uj} n =1 , generated in this way are therefore orthonormal because each vector has unit 
length. ■ 
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As in the case of F n , if you have a finite dimensional subspace of an inner product space, you 
can begin with a basis and then apply the Gram Schmidt process above to obtain an orthonormal 
basis. 

There is nothing wrong with the above algorithm, but when you use it, it tends to get pretty 
intricate and it is easy to get lost in the details. There is a way to simplify it to produce fewer 
steps using matrices. I will illustrate in the case of three vectors. Say {ui,u 2 ,u 3 } is linearly 
independent and you wish to find an orthonormal set {vi,v 2 ,V3} which has the same span such 
that span (ui, • • • , u^) = span (vi, • • • , v^) for each k = 1, 2, 3. Then you would have 



( vi v 2 v 3 ] 
where R is an upper triangular matrix. Then 



( ui u 2 u 3 ) R 



s=l 



5jk = (vj,v fc ) = I ^2u r R rj ,^2u s R 

\r=l 
= 2_^ R r 3 ( Ur ' Us ) ^ 



sk 



*^sk 



Let G be the matrix whose rs entry is (u r ,u s ) . This is called the Grammian matrix. Then the 
above reduces to the following matrix equation. 



it GR 



Taking inverses of both sides yields 



I = R-'G- 1 (RT)' 1 



Then it follows that 



RR 1 



G~\ 



(16.20) 



Example 16.5.10 Let the real inner product space V consist of the continuous functions defined 
on [0, 1] with the inner product given by 



(f,9) 



f (x) g 0) dx 



and consider the functions (vectors) {l,x,x 2 ,x 3 } . Show this is a linearly independent set of vectors 
and obtain an orthonormal set of vectors having the same span. 

First, why is this a linearly independent set of vectors? This follows easily from Problem 15 on 
Page 446. You consider the Wronskian of these functions. 



det 



/ 1 X x 2 x 3 \ 

1 2x 3x 2 

2 6x 

\ 6 J 



7^0. 



Therefore, the vectors are linearly independent. Now following the above procedure with matrices, 
let 

( a\ a 2 as 0,4 \ 

a$ a$ aj 

as ag 

\ a 10 J 



R 
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Also it is necessary to compute the Grammian. 



/ L dx L xdx J x 2 dx J x 3 dx \ 



/ 1 i i i \ 

Iff! 

? f f ? , 

4 5 6 7/ 



You also need the inverse of this Grammian. However, a computer algebra system can provide this 
right away. 

/ 16 -120 240 -140 \ 
-120 1200 -2700 1680 
240 -2700 6480 -4200 
\ -140 1680 -4200 2800 / 



G~ l = 



Now it just remains to find the a^. 



( a x a 2 a 3 

a 5 a 6 

a 8 

\ 



a 4 \ 
a 7 
a 9 
«io J 



( ai 






a 2 a 3 

a 5 a 6 

a 8 





a 4 ^ 

a 7 
a 9 



/ a^ + a^ + a 3 + a| 
a 2 a 5 + a 3 a 6 + a 4 a 7 
a 3 a8 + a 4 ag 

(2 4 (2l0 



a 2 a 5 + a 3 a 6 + a A a 7 a 3 a 8 + a 4 a 9 a 4 ai \ 



a£ 



a* 



■ oz 



a 6 a 8 + a 7 a 9 

«7 a 10 



«6«8 

a? 



V 



16 
-120 

240 
-140 



-120 

1200 

-2700 

1680 



240 

-2700 

6480 



- a; 
agaio 

-140 \ 
1680 
-4200 



a 7 a 9 a 7 ai 

2 



9 



agaio 

x 10 



at 



-4200 2800 / 



Thus you can take (There is more than one solution.) 

a 10 = V2800 = 20^, a 9 = -30>/7, a 8 = 6>/5, a 7 = 12>/7, a 6 

as = 2v3, a 4 = — v7, a 3 = v5, a 2 = — v3, ai = 1 
Thus the desired orthonormal basis is given by 



-6^5 



(1 



/l -\/3 V5 -V7 \ 

2^ -6>/5 12>/7 

6^5 -30^ 

V 20^7 / 



which yields 



1, 2^ - \/3, 6Vbx 2 - 6y/bx + \/5, 20>/72 



SOv^x 2 



12V7x - >/7 



16.5.4 Approximation And Least Squares 

Let V be an inner product space and let [/bea finite dimensional subspace. Given y 6 V, how can 
you find the vector of U which is closest to y out of all such vectors in Ul Does there even exist 
such a closest vector? The following picture is suggestive of the conclusion of the following lemma. 
It turns out that pictures like this do not mislead when you are dealing with inner product spaces 
in any number of dimensions. 
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Note that in the picture, z is a point in U and also w is a point of U. The following lemma states 
that for z to be closest to y out of all vectors in U, the vector from z to y should be perpendicular to 
any vector w € U. Since U is a subspace, this is the same as saying that the vector zy is perpendicular 
to the vector zw which is the situation illustrated by the above picture. 

Lemma 16.5.11 Suppose y 6 V, an inner product space and U is a subspace ofV. Then 

|y-z| < |y- w| 

for allvseU if and only if for all w G U, 

(y-z,w)=0. (16.21) 

Furthermore, there is at most one z which minimizes |y — w| for w 6 U. 

Proof: First suppose condition 16.21. Letting w £ U, and using the properties of the inner 
product and the definition of the norm, 



|y-z + z 

i |2 , 



I 2 

|z — w 



w| =|y- 

2 



■ w| + 2 Re (y — z,z — w) 



It follows then that |y — w| is minimized when w = z.Next suppose z is a minimizer. Then pick 
w e U and let teR. Let € C be such that \0\ = 1 and (y - z, w) = |(y - z, w)|. Then 

t\-> |(y-z)+t£w| 2 
has a minimum when t — 0. But from the axioms of the inner product and definition of the norm, 
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-z,w) 


t 2 |w| 


2 + 2t|(y-z, 


w)| 



Elementary Linear Algebra Vector Spaces 

this function of t equals 

|y-z| 2 - 

= |y-z| 2 
= |y-z| 2 - 

Hence its derivative when t = which is 2 |(y — z, w)| equals 0. 

Suppose now that z^,z = 1,2 both are minimizers. Then, as above, 

|y- z il 2 = |y- z 2l 2 + l z i - z 2| 2 

and this is a contradiction unless zi = Z2 because |y — z-J = |y — z 2 | . ■ 

This z described above is called the orthogonal projection of y onto U. The picture suggests 
why it is called this. The vector y — z is perpendicular to the vectors in U. 

With the above lemma, here is a theorem about existence, uniqueness and properties of a mini- 
mizer. The following theorem shows that the orthogonal projection is obtained by the linear trans- 
formation given by the formula 

n 
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Note that the formula as well as the geometric interpretation suggested in the above picture shows 
that T 2 = T. 

Theorem 16.5.12 Let V be an inner product space and let U be an n dimensional subspace ofV. 
Then ifyeV is given, there exists a unique xG[/ such that 

|y-x| < |y-w| 

for all w G U and in addition, there is a formula for x in terms of any orthonormal basis for 
£/,{ui,--- ,u n } ; 



x =X^ y ' u fc) Ufe 



fc=i 
Proof: By Lemma 16.5.11 there is at most one minimizer and it is characterized by the condition 

(y - x, w) = 

for all w G U. Let {uk}^ =1 be an orthonormal basis for U. By the Gram Schmidt process, Lemma 
16.5.9, there exists such an orthonormal basis. Now it only remains to verify that 



(y-X^ y ' u fc) Ufc ' w ) =0 

for all w. Since {uk}^ =1 is a basis, it suffices to verify that 

(y-^2 ( y ' u *^ U/c ' Ul ) = °' a11 l = lj 2 ' ' ' ' ' n 
However, from the properties of the inner product, 

(n \ n 

y- Yl ( y > u k) u fc> u w = ( y > u *) ~ J2 ^ y ' u k) ( u ^ u ^) 
fc=l / fc=l 



= (y, u *) - Yl ^ y ' u k) s ki = (y> u *) - (y» u *) = o. ■ 
fc=i 

Note it follows that for any orthonormal basis {uk}^ =1 , the same unique vector x is obtained as 

n 

^(y,u fc )u fe 
k=i 

and it is the unique minimizer of wH>|y — w|. This is stated in the following corollary for the sake 
of emphasis. 

Corollary 16.5.13 Let V be an inner product space and let U be an n dimensional subspace ofV. 
Then for y given in V, and {uk}^ =1 , {vk}k=i two orthonormal bases for U, 



J2 (y, u fe ) u k = J2 (y, v /c) vfe 



fc=l fc=l 

The scalars (y,u^) are called the Fourier coefficients. 
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Example 16.5.14 Let V denote the real inner product space consisting of continuous functions 
defined on [0, 1] with the inner product 



(f,9) = / f(x)g(x)dx. 
Jo 



Let U = span (l,x,x 2 ) . It is desired to find the vector (function) in U which is closest to sin in the 
norm determined by this inner product. Thus it is desired to minimize 

2 V /2 

| sin (x) — p(x)\ dx I 
o / 

out of all functions p contained in U. 

By Example 16.5.10, an orthonormal basis for U is 

|l, 2\fex - >/3, QVSx 2 - 6V$x + V^} 

Then by Theorem 16.5.12, the closest vector (function) in U to sin can be computed as follows. First 
determine the Fourier coefficients. 

1 sin (x) dx = 1 — cos (1) 



/ 

Jo 



f 

Jo 



( 2 Vox — Vo J sin (x) dx — Vo (— cos 1 + 2 sin 1 — 1) 

Next, from Theorem 16.5.12, the closest point to sin is 

(1 - cos (1)) + (V3 (- cos 1 + 2 sin 1 - 1)) hy/Sx - Vs) 
+ (VE (11 cos 1 + 6 sin 1-11)) UVbx 2 - dVbx + y/b\ 



Simplifying and approximating things like sinl, this yields the following for the approximation to 
sinx. 

-0.235 46x 2 + 1. 091 3x - 7. 464 9 x 10" 3 

If this is graphed along with sinx for x G [0, 1] the result is as follows. 




One of the functions is represented by the solid line and the other by the dashed line. Now by 
contrast, consider the Taylor series for sinx up to degree 2. This is just / (x) = x + Ox 2 . 




You see the difference. The approximation using the inner product norm, called mean square 
approximation, attempts to approximate the given function on the whole interval while the Taylor 
series approximation is only good for small x. 
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16.5.5 Fourier Series 

One of the most important applications of these ideas about approximation is to Fourier series. 
Much more can be said about these than will be presented here. However, Theorem 16.5.12 is a 
very useful framework for discussing these series. 
For xGM, define e lx by the following formula 



The reason for defining it this way is that e 2 ° = 1, and (e lx } = ie lx if you use this definition. Also 
it follows from the trigonometry identities that 

e i(x+y) _ e ix e iy 

This is because 

e lx e iy = (cos x + i sin x) (cos y + i sin y) 

= cos x cos y — sin x sin y + i (sin x cos y + cos x sin y) 



= cos (x + y) + i sin (x + y) = e 



- J(x+y) 
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In addition, 

i** = e~ ix 

because 

e ix — cos x — is'mx = cos (—x) + z sin (— x) = e _z:c 

It follows that the functions — ±=e lkx for 

V2tt 



fc € {-n, - (n - 1) , • • • , -1, 0, 1, • • • , (n - 1) , n} = J n 

form an orthonormal set in V, the inner product space of continuous functions defined on [— 7r, 7r] 
with the inner product given by 

/ f(x)g(x)dx = (f,g). 

J —TV 

I will verify this now. Let fc, I be two integers in J n , fc 7^ I 

/ e» fcaj e* te da: = / e< k ~ l ^ x dx = 

J —7T J— 7T 



i(k-iy-* 

COS (fc — /) 7T — COS (fc — Z) (— 7r) = 



Also 



/ n 1 1 1 r 71 

e ikx e ikx dx = _ / e 0x dx = L 

-* V^ V^ 2 7T J-n 



Example 16.5.15 Let V be the inner product space of piecewise continuous functions defined on 
[— 7r,7r] and let U denote the span of the functions (vectors) {^ hx } k= _ n - Let 

r ( \ _ j 1 if x >0 
; [X) ~ { -1 ifx<0 

Find the vector of U which is closest to f in the mean square sense (In the norm defined by this 
inner product). 

First of all, you need to find the Fourier coefficients. Since x \-> cos (x) is even and x \-> sin (x) 
is odd, 



f (x) —=e~ lkx dx = — = / sin (-kx)dx 

-7T v 2tt v27t Jo 

-i\/2 1 - cos (kn) r r , N , 

fc V ; , J f(x)dx = 0. 



V 71 " « 

Therefore, the best approximation is 

-i 1 — cos (&7r)\ >/2 



£ 



ikx 



k=-n ^ k > V^ 

The term for /c can be combined with the term for — k to yield 

-i 1 - cos (kn) \ V2 , ikx ikx ^ _ ( -i 1 - cos (kn)\ V2 



(e ikx - e~ ikx ) = ( -^ ^; v " /ty ) -^= (2isinte) 
sin kx 



2 1 — cos (/c7r) 



k 7T k 

The terms when /c is even are all 0. Therefore, the above reduces to 
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In the case where n = 4, the graph of the function / being approximated along with the above 
function which is approximating it are as shown in the following picture. 



This sum which delivers the closest point in U will be denoted by S n f. 

Note how the approximate function, closest in the mean square norm, is not equal to the given 
function at very many points but is trying to be close to it across the entire interval [— 7r,7r], except 
for a small interval centered at 0. You might try doing similar graphs on a calculator or computer 
in which you take larger and larger values of n. What will happen is that there will be a little 
bump near the point of discontinuity which won't go away, but this little bump will get thinner and 
thinner. The reason this must happen is roughly because the functions in the sum are continuous 
and the function being approximated is not. Therefore, convergence cannot take place uniformly. 
This is all I will say about these considerations because this is not an analysis book. See Problem 
20 below for a discussion of where the Fourier series does converge at jumps. 

16.5.6 The Discreet Fourier Transform 

Everything done above for the Fourier series on [— 7r, tt] could have been done just as easily on [0, 2n] 
because all the functions are periodic of period 2ir. Thus, for / a function defined on [0, 27r] , you 
could consider the partial sums of the Fourier series on [0, 2tt] 



a k e 



where 



i />2tt 



27T./0 



** = ;r/ f(y)z- iky dy 



and all the results of the last section continue to apply except now it is on the new interval. This is 
done to make the presentation of what follows easier to write. 

The idea is that maybe you don't know what the function / is at all points, only at certain points 



.2tt 
AT 



Xj =j-j^, i = 0, !,-•• ,7V 



Then instead of the integral given above, you could write a Riemann sum which approximates it. 
I will simply write the left Riemann sum. This yields the approximation bk for a&. Assuming / is 
continuous, the approximation would improve as N — » oo. 

h 2tt ^ J \ J N ) N 

j=0 v 7 



* £ W 



3=0 

where yj is defined to be the value of the function at Xj . This is called the discreet Fourier transform. 
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In terms of matrix multiplication, let cun = e * n . Then 

/ 1 / \JV-1 

/ 1 (cjat) 



/*- 



\ 



(AT-1) 

6-1 

bo 
h 



V fo-i / 



1 

TV 



1 
1 
1 

Vi 



1 



/V-l 



iu - )( N-l)(N-l) ^ 

(^)' (7V - 1} 
1 

t ) (iNr " 1} 



(iV-l)(AT-l) 



\ VN-l J 



I 



Thus you can find these approximate values by matrix multiplication. 

Example 16.5.16 Suppose you have the following table of values for the function f. 

( 1 \ 

it/2 2 

7T -1 

3tt/2 1 
\ 2ir 2 ) 



Note that the above only uses the first four values. 

In this case, N — 4 and so ujn = e~ l ( n ' 2 ' 
given by 



-i. Then the approximate Fourier coefficients are 



f b- 3 \ 
b- 2 

b-i 

bo 

h 

b 2 

V h J 

( 



( 1 



(?y (?) 



3 ' 



1 {-if (-if (-if 



\ I ( if (~if (~if J 
( 1 \ 



t l \ 


2 


-1 


V i / 



-i -1 i \ / 2-i \ 

-1 1 -1 . , -3 

i -1 -* | 9 I i 2 + i 

111 -1 =4 3 

— i — 1 i \ i / 2 — i 

-11-1 \ 1 / -3 

V 1 z -1 -z / \ 2 + z y 

It follows that the approximate Fourier series for the given function is 

- ((2 - i) e~ 3ix + (-3) e- 2ia? + (2 + z) e - fo + 3 + (2 - z) e ix + (-3) e 2ix + (2 + z) e 3ix ) 



This simplifies to 



3^/1 1 . 

- + 2 I - cos x — -smi 



3 ( \ 1 

- cos (2x) + 21- cos 3x — - sin 3x 



If you graph this, it will not do all that well in approximating some functions which have the given 
values at the given points. This is not surprising since only four points were considered. This is 
why in practice, people like to use a large number of points and when you do, the computations 
become sufficiently long that a special algorithm was developed for doing them. It is called the fast 
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Fourier transform. So when you see this mentioned, this is what it is about, efficiently computing the 
discreet Fourier transform which can be thought of as a way to approximate the Fourier coefficients 



based on incomplete information for a given function. 

16.6 Exercises 

1. Verify that Examples 16.5.1 - 16.5.4 are each inner product spaces. 

2. In each of the examples 16.5.1 - 16.5.4 write the Cauchy Schwarz inequality. 

3. Verify 16.16 and 16.17. 

4. Consider the Cauchy Schwarz inequality. Show that it still holds under the assumptions 

(u, v) = (v, u), ((flu + 6v) , z) = a (u, z) + b (v, z) , and (u, u) > 0. Thus it is not necessary to 
say that (u, u) = only if u = 0. It is enough to simply state that (u, u) > 0. 

5. Consider the integers modulo a prime, Z p . This is a field of scalars. Now let the vector space 
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be (Z p ) n where n > p. Define now 

n 
(Z,W) = ^ZjWj 
i=l 

Does this satisfy the axioms of an inner product? Does the Cauchy Schwarz inequality hold 
for this ()? Does the Cauchy Schwarz inequality even make any sense? 

6. If you only know that (u, u) > along with the other axioms of the inner product and if you 
define |z| the same way, how do the conclusions of Theorem 16.5.7 change? 

7. In an inner product space, an open ball is the set 

£(x,r) = {y : |y-x| < r}. 

If z G B (x, r) , show there exists 5 > such that B (z, S) C B (x, r). In words, this says that 
an open ball is open. Hint: This depends on the triangle inequality. 

8. Let V be the real inner product space consisting of continuous functions defined on [—1,1] 
with the inner product given by 

/ 0) g (x) dx 



/. 



Show that {l,x,x 2 } are linearly independent and find an orthonormal basis for the span of 
these vectors. 

9. A regular Sturm Liouville problem involves the differential equation for an unknown 
function of x which is denoted here by y, 

(p (x) y')' + (Xq (x) + r (x)) y = 0, x € [a, b] 

and it is assumed that p(t) ,q (t) > for any t along with boundary conditions, 

C iy (a) + C 2 y'(a) = 

C 3 y(b) + C 4 y'(b) = 

where 

C\ + C\ > 0, and Cf + Cf > 0. 

There is an immense theory connected to these important problems. The constant A is called 
an eigenvalue. Show that if y is a solution to the above problem corresponding to A = Ai and 
if z is a solution corresponding to A = A2 7^ Ai, then 



/ 

J a 



b 

q(x)y(x)z(x)dx = 0. (16.22) 



Hint: Do something like this: 



(x) y')' z + (Ai<? (x) + r (x)) yz = 0, 

(p (x) z')' y + (X 2 q (x) + r (x)) zy = 0. 
Now subtract and either use integration by parts or show 

(p (x) y')' z-(p (x) z')' y = ((p (x) y') z - (p (x) z') y)' 

and then integrate. Use the boundary conditions to show that y' (a) z (a) — z' (a) y (a) = and 
y f (b)z(b)-z f (b)y(b) = 0. 



Download free eBooks at bookboon.com 

469 



Elementary Linear Algebra Vector Spaces 

10. Using the above problem or standard techniques of calculus, show that 

< —= sm (nx) > 
I v ^ J 1 

are orthonormal with respect to the inner product 



(f,9) = / f(x)g(x)dx 
Jo 

Hint: If you want to use the above problem, show that sin (nx) is a solution to the boundary 
value problem 

y" + n 2 y = 0, y(0) = y(n) = 

11. Find S$f (x) where f (x) — x on [— 7r, it] . Then graph both S$f (x) and / (x) if you have access 
to a system which will do a good job of it. 

12. Find S$f (x) where f (x) = \x\ on [— 7r,7r] . Then graph both S$f (x) and f (x) if you have 
access to a system which will do a good job of it. 

13. Find S$f (x) where / (x) = x 2 on [— 7r,7r] . Then graph both S$f (x) and f (x) if you have 
access to a system which will do a good job of it. 

14. Let V be the set of real polynomials defined on [0, 1] which have degree at most 2. Make this 
into a real inner product space by defining 

</, g) = f (0)5(0) + / (1/2) g (1/2) + / (1) 5 (1) 

Find an orthonormal basis and explain why this is an inner product. 

15. Consider R n with the following definition. 



( x >y) =^2^iyi 



2=1 



Does this define an inner product? If so, explain why and state the Cauchy Schwarz inequality 
in terms of sums. 

16. From the above, for / a piecewise continuous function, 



s "f (*) = h ^ eikx (/* ; (y) e ~ ikVd y) ■ 



k 
Show this can be written in the form 

S n f 0) = / / (y) D n (x - y) dy 

where 



^«4E ei 



kt 

2tt 



/c=— n 

This is called the Dirichlet kernel. Show that 



1 sin(n + (l/2))t 
nlj "27r sin(t/2) 
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For V the vector space of piecewise continuous functions, define S n : V \-> V by 

S n f(x)= I f(y)D n (x-y)dy. 

J — 7T 

Show that S n is a linear transformation. (In fact, S n f is not just piecewise continuous but 
infinitely different iable. Why?) Explain why J_ D n (t) dt — 1. Hint: To obtain the formula, 
do the following. 



e*(*/2)£> n (t) = — V e^+^/a))* 

27T ^^ 

e^-*^D n (t) = — V e i ( fe -( 1 /2))t 
27T ^^ 
fc= — n 

Change the variable of summation in the bottom sum and then subtract and solve for D n (t). 

17. |Let V be an inner product space and let U be a finite dimensional subspace with an orthonor- 
mal basis {ui}™ =1 . If y G V, show 

n 

|y| 2 >yi(y,u fe )| 2 

fe=l 

Now suppose that {u/ c }^ 1 is an orthonormal set of vectors of V. Explain why 

lim (y,u fc ) =0. 

AC— )-CO 

When applied to functions, this is a special case of the Riemann Lebesgue lemma. 

18. tLet / be any piecewise continuous function which is bounded on [— 7r, n] . Show, using the 
above problem, that 

f (t) sin (ni) dt = lim / / (t) cos (nt) dt = 

-7T </ — 7T 

19. t*Let / be a function which is defined on (— 7r,7r]. The 2-/T periodic extension is given by 
the formula / (x + 27r) = / (x) . In the rest of this problem, / will refer to this 2tt periodic 
extension. Assume that / is piecewise continuous, bounded, and also that the following limits 
exist 

lim f(x + y)-f(x+) ^ f(x-y)-f(x+) 

y^0+ y ' y^0+ y 

Here it is assumed that 

f(x+)= lim f{x + h), f(x-)= lim f(x-h) 

h->0+ h->Q+ 

both exist at every point. The above conditions rule out functions where the slope taken 
from either side becomes infinite. Justify the following assertions and eventually conclude that 
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under these very reasonable conditions 

lim S n f (x) = (f (*+) + / (x-)) /2 

n— )>oo 

the mid point of the jump. In words, the Fourier series converges to the midpoint of the jump 
of the function. 



S n f(x)= ^ f(x-y)D n (y) 

J — 7T 



dy 



Snf(x) 



f(x+) + f(x-) 



f 

J —TV 



f(x-y) D n (y) dy 



P7T P7T 

/ f(x-y)D n (y) dy+ f(x + y)D n (y) dy 
Jo Jo 

(f(x+) + f(x-))D n (y)dy 



< 



I' (f(x-y)-f (x-)) D n (y) dy + f (/ (x + y) - f (x+)) 

JO JO 



' D n (y) dy 
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Now apply some trig, identities and use the result of Problem 18 to conclude that both of 
these terms must converge to 0. 

20. tUsing the Fourier series obtained in Problem 11 and the result of Problem 19 above, find an 
interesting formula by examining where the Fourier series converges when x = tt/2. Of course 
you can get many other interesting formulas in the same way. Hint: You should get 

Snf (x) = ^2 1 Sin ( kX ^ 

k=l 

21. Let V be an inner product space and let K be a convex subset of V. This means that if 
x,zGi^, then the line segment x + t (z — x) = (1 — t) x + tz is contained in K for all t G [0, 1] . 
Note that every subspace is a convex set. Let y G V and let xGK Show that x is the closest 
point to y out of all points in K if and only if for all wGK, 

Re (y — x, w — x) < 0. 

In M n , a picture of the above situation where x is the closest point to y is as follows. 




The condition of the above variational inequality is that the angle 6 shown in the picture is 
larger than 90 degrees. Recall the geometric description of the dot product presented earlier. 
See Page 49. 

22. Show that in any inner product space the parallelogram identity holds. 

|x + y| 2 + |x-y| 2 = 2|x| 2 + 2|y| 2 
Next show that in a real inner product space, the polarization identity holds. 

(x,y) = i(|x + y| 2 -|x-y| 2 ). 

23. *This problem is for those who know about Cauchy sequences and completeness of W and 
about closed sets. Suppose K is a closed nonempty convex subset of a finite dimensional 
subspace U of an inner product space V. Let y G V. Then show there exists a unique point 
x G K which is closest to y. Hint: Let 

A = inf{|y-z| :zeK} 

Let {x n } be a minimizing sequence, 

|y-x n | -+ A. 

Use the parallelogram identity in the above problem to show that {x n } is a Cauchy sequence. 
Now let {ufc}^ =1 be an orthonormal basis for U. Say 

v 

fc=l 

Verify that for c n = (c™, •-, C ;)gP 

X-ri Xt-vj, C - 



\Wp ' 



Now use completeness of ¥ p and the assumption that K is closed to get the existence of x G K 
such that |x — y| = A. 
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24. *Let K be a closed nonempty convex subset of a finite dimensional subspace U of a real inner 
product space V. (It is true for complex ones also.) For x G V, denote by Px the unique 
closest point to x in K. Verify that P is Lipschitz continuous with Lipschitz constant 1, 

|Px-Py|<|x-y|. 

Hint: Use Problem 21. 

25. * This problem is for people who know about compactness. It is an analysis problem. If 
you have only had the usual undergraduate calculus course, don't waste your time with this 
problem. Suppose V is a finite dimensional normed linear space. Recall this means that there 
exists a norm ||-|| defined on V as described above, 

||v|| > equals if and only if v = 

||v + u||<||u|| + ||v||, ||av|| = M||v||. 

Let |-| denote the norm which comes from Example 16.5.3, the inner product by decree. Show 
|-| and ||-|| are equivalent. That is, there exist constants 5, A > such that for all x G V, 

<5|x|<||x||<A|x|. 

In explain why every two norms on a finite dimensional vector space must be equivalent in the 
above sense. 
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17.1 Matrix Multiplication As A Linear Transformation 

Definition 17.1.1 Let V and W be two finite dimensional vector spaces. A function, L which maps 
V to W is called a linear transformation and L G C (V, W) if for all scalars a and /3, and vectors 
v,w, 

L (av+/3w) = olL (v) + f3L (w) . 

These linear transformations are also called homomorphisms. If one of them is one to one, it is 
called infective and if it is onto, it is called surjective. When a linear transformation is both one to 
one and onto, it is called bijective. , 
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An example of a linear transformation is familiar matrix multiplication. Let A = (a^) be an 
m x n matrix. Then an example of a linear transformation L : F n i— >• F m is given by 

n 
3=1 

Here 

Vi 

e¥ n . 



17.2 C {V, W) As A Vector Space 

In what follows I will denote vectors in bold face. However, this does not mean they are in F n . 
Definition 17.2.1 Given L,M e C (V, W) define a new element of C (V, W) , denoted by L + M 
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according to the rule 

(L + M) v = Lv + Mv. 

For a a scalar and L E C(V, W) , define aL E C (V, W) 6y 

aL (v) = a (2/v) . 

You should verify that all the axioms of a vector space hold for C (V, W) with the above definitions 
of vector addition and scalar multiplication. What about the dimension of C (V, W)l 

Theorem 17.2.2 Let V and W be finite dimensional linear spaces of dimension n and m respec- 
tively Then dim (C (V, W)) = mn. 

Proof: Let the two sets of bases be 

{vi,--- ,v n } and {w lr -- ,w m } 

for X and Y respectively. Let E ik E C (V, W) be the linear transformation defined on the basis, 
{vi,-" ,v n }, by 

where 5i k = 1 if i = k and if i ^ k. Thus 

/ n \ n n 

E ik I ^c s v s J = ^2c s E ik v s = ^ CgWjSgk = c k Wi. 

\s=l J s=l s=l 

Then let L E C (V, W). Since {wi, • • • , w m } is a basis, there exist constants dj k such that 

m 

Lv r = 2_. dj r Wj 

3=1 

Also 

m n m 

J2 Yl d Jk E Jk ( V r) = Yl d J> W i" 
j=l k=l j=l 

It follows that 

m n 

L = 2222d jk E jk 

3=1 k=l 

because the two linear transformations agree on a basis. Since L is arbitrary, this shows 

{E ik : i = l,--- ,ra, k = 1, • • • , n} 

spans £(V,W0. 
If 

2_^d ik E ik = 0, 

then 

m 
= ^ d ik E ik (Vf) = ^ d^W* 
i,fc z=l 

and so, since {wi, • • • , w m } is a basis, du = for each i = 1, • • • , m. Since Z is arbitrary, this shows 
du = for all i and I. Thus these linear transformations form a basis and this shows the dimension 
of C (V, W) is mn as claimed. ■ 
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17.3 Eigenvalues And Eigenvectors Of Linear Transforma- 
tions 

Here is a very useful theorem due to Sylvester. 

Theorem 17.3.1 Let A G C(V,W) and B G C(W,U) where V,W,U are all vector spaces over a 
field F. Suppose also that ker (A) and A (ker (BA)) are finite dimensional subspaces. Then 

dim (ker (BA)) < dim (ker (B)) + dim (ker (A)) . 

Proof: If x G ker (BA) , then Ax G ker (B) and so A (ker (BA)) C ker (B) . The following picture 
may help. 




Now let {xi, • • • , x n } be a basis of ker (A) and let {Ayi, • • • , Ay m } be a basis for A (ker (BA)) . 
Take any z G ker (5 A) . Then Az = Y^iLi a %M)i an d so 

A I z-^diUi J = 
which means z — YlT=i a ^^ ^ ker (^) anc ^ so there are scalars 6^ such that 

m n 

^- ^aiVi = ^bjXj. 

i=l j=l 

It follows span (xi, • • • ,x n ,yi,- — , i/ m ) 2 ker (-EL4) and so by the first part, (See the picture.) 

dim (ker (BA)) < n + m < dim (ker (A)) + dim (ker (5)) ■ 

Of course this result holds for any finite product of linear transformations by induction. One way 
this is quite useful is in the case where you have a finite product of linear transformations Yii=i Li 
all in C (V, V) . Then 

( l \ l 
dim ker TT ^i < /J dim (ker Li) 

\ i=l J i=l 
and so if you can find a linearly independent set of vectors in ker ( n*=i ^i ) of size 

l 
2_] dim (ker Li) , 

i=l 

then it must be a basis for ker ( Yli=i L% ) • 

Definition 17.3.2 Let {V^}[ =1 be subspaces ofV. Then 

r 

E* 

2=1 
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denotes all sums of the form J2l=i v * where Vi G V%. If whenever 

r 

^>, = 0, ViEVi, (17.1) 

i=l 

it follows that Vi = for each i, then a special notation is used to denote ^[=1 ^- ^^ 5 notation is 

V 1 ®-"®V r 

and it is called a direct sum of subspaces. 

Lemma 17.3.3 IfV = Vi®'--(&V r and if f3 i — {v^, • • • , v^. } is a basis for Vi, then a basis for 
V is{f3 ir -. ,f3 r }. 

Proof: Suppose XX=i YlT=i Ci 3^) ~ ®- tnen smce ^ i s a direct sum, it follows for each z, 

raj 

J2 c v vi j = 

3 = 1 

and now since {vj, • • • , v^. } is a basis, each cij = 0. ■ 
Here is a useful lemma. 

Lemma 17.3.4 Let Li be in C (V, V) and suppose for i ^ j, LiLj = LjLi and also Li is one to one 
on ker (Lj) whenever i ^ j. Then 

ker ( n Li ) = ker ( Li ) e + * * * + e ker ( l p) 

Here nf=i Li is the product of all the linear transformations. A symbol like Tij^i Lj is the product 
of all of them but Li . 

Proof: Note that since the operators commute, Lj : ker (Li) \-> ker (Li). Here is why. If L^y = 
so that y G ker (Li) , then 

LiLjy = LjLiy = LjO = 

and so Lj : ker (Li) i-^ ker (Li). Suppose 



^v i= 0, Vi E ker (Li), 



2=1 



but some v^ ^ 0. Then do Yij^i Lj to both sides. Since the linear transformations commute, this 
results in 
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which contradicts the assumption that these Lj are one to one and the observation that they map 

ker (Li) to ker (Li). Thus if 

^v; = 0, Vi e ker (L^) 

i 

then each v$ = 0. 

Let fit = {v\, — • , v^.} be a basis for ker (Li). Then from what was just shown and Lemma 
17.3.3, {/?!,•• ' ,(3 p } must be linearly independent and a basis for 

ker (Li) © H h © ker (L p ) . 

It is also clear that since these operators commute, 

ker (Li) © + ••• + ©ker (L p ) C ker I Y[U J 
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Therefore, by Sylvester's theorem and the above, 

dim ( ker ( f[ Li ] ] < ^ dim (ker (Lj)) 

= dim (ker (Li) © H h © ker (L p )) 

( ( P 
< dim ker TT Li 

Now in general, if W is a subspace of V, a finite dimensional vector space and the two have the same 
dimension, then W = V. This is because W has a basis and if v is not in the span of this basis, 
then v adjoined to the basis of W would be a linearly independent set so the dimension of V would 
then be strictly larger than the dimension of W. 
It follows 




ker(Li)© + --- + ©ker(L p ) = ker [Y[ L i 



Here is a situation in which the above holds, ker (A — Xil) r is sometimes called a generalized 
eigenspace. The following is an important result on generalized eigenspaces. 

Theorem 17.3.5 Let V be a vector space of dimension n and A a linear transformation and suppose 
{Ai, • • • , Afc} are distinct scalars. Define for Ti G N 

Vi= ker (A -\il) n (17.2) 

Then 

ker tf[(A- XJp \=Vi®---®V p . (17.3) 

Proof: It is obvious the linear transformations (A — Xil) rt commute. Now here is a claim. 
Claim : Let \i ^ A^, Then (A — [il) m : V% i->- V{ and is one to one and onto for every mGN. 
Proof: It is clear (A — jjlI) 771 maps V{ to V{ because if v G V{ then (A — XiI) Ti v = 0. Conse- 
quently, 

(A - Xiiy* (A - /i/) m v = (A - /i/) m (A - XJ) n v = (A - ^I) m = 

which shows that (A — fil) m vG^. 

It remains to verify (A — fil) m is one to one. This will be done by showing that (A — jil) is one to 
one. Let w G V{ and suppose (A — jil) w = so that ^4w = /iw. Then for m = r^ (A — Xil) m w = 
and so by the binomial theorem, 



(M-A i rw=£(7 (-A,r-yw 



1=0 
rn , x 

J2 (7) (-^) m_ ' ^ w = ( A - ^r w = °- 

Therefore, since /i ^ A^, it follows w = and this verifies (A — fil) is one to one. Thus (A — fil) m 
is also one to one on Vi. Letting {u^, • • • , u l rk } be a basis for V^ it follows 

{{A- iil) m u\,-- , (A- iil) m \i l rk ] 

is also a basis and so (A — iil) m is also onto. The desired result now follows from Lemma 17.3.4. ■ 
Let V be a finite dimensional vector space with field of scalars C. For example, it could be a 
subspace of C n . Also suppose A G C (V, V) . Does A have eigenvalues and eigenvectors just like the 
case where A is a n x n matrix? 
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Theorem 17.3.6 Let V be a nonzero finite dimensional vector space of dimension n. Suppose also 
the field of scalar s equals C. 1 Suppose A G C (V, V) . Then there exists v/0 and A G C such that 

Av = Av. 

Proof: Consider the linear transformations, /, A, A 2 , • • • , A n . There are n 2 + 1 of these trans- 
formations and so by Theorem 17 '.2.2 the set is linearly dependent. Thus there exist constants, 
Co G C such that 



coI + ^2c k A k = 0. 



fc=i 

This implies there exists a polynomial, q (A) which has the property that q (A) = 0. In fact, q (A) = 

Co + Y^k=i c k^ k • Dividing by the leading term, it can be assumed this polynomial is of the form 
A m + c m _iA m_ + • • • + c\\ + Co, a monic polynomial. Now consider all such monic polynomials, 
q such that q (A) — and pick one which has the smallest degree. This is called the minimal 
polynomial and will be denoted here by p (A) . By the fundamental theorem of algebra, p (A) is of 
the form 

771 

p(A) = n(A-A fc ). 

fc=l 

where some of the X k might be repeated. Thus, since p has minimal degree, 

m ra— 1 

Y[ (A - X k I) = 0, but [J (A - X k I) + 0. 
fc=i fe=i 

Therefore, there exists u/0 such that 

v=(n(A-A fe /)J(u)^0. 

But then 

(A - \ m I) v = {A- \ m I) I H (A - \ k I) J (u) = 0. ■ 

As a corollary, it is good to mention that the minimal polynomial just discussed is unique. 

Corollary 17.3.7 Let A G £ (V, V) where V is an n dimensional vector space, the field of scalars 
being ¥. Then there exists a polynomial q(X) having coefficients in ¥ such that q(A) = 0. Letting 
p (A) be the monic polynomial having smallest degree such that p (A) = 0, it follows that p (A) is 
unique. 

Proof: The existence of p (A) follows from the above theorem. Suppose then that p\ (A) is 
another one. That is, it has minimal degree of all polynomials q (A) satisfying q (A) = and is 
monic. Then by Lemma 16.3.3 there exists r (A) which is either equal to or has degree smaller 
than that of p (A) and a polynomial I (A) such that 

Pi(A)=p(A)*(A) + r(A) 

By assumption, r (A) = 0. Therefore, r (A) = 0. Also by assumption, p\ (A) and p (A) have the same 
degree and so / (A) is a scalar. Since p\ (A) and p (A) are both monic, it follows this scalar must 
equal 1. This shows uniqueness. ■ 



1 All that is really needed is that the minimal polynomial can be completely factored in the given field. The complex 
numbers have this property from the fundamental theorem of algebra. 
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Corollary 17.3.8 In the above theorem, each of the scalars A& has the property that there exists a 
nonzero v such that (A — A^J) v = 0. Furthermore the Xi are the only scalars with this property. 

Proof: For the first claim, just factor out (A — A*/) instead of (A — X m I) . Next suppose 

(A-fjbI)v = 

for some \i and v/0. Then 

m 771—1 / ti 

= JJ (A - A fc J) v = J] (A - \ k I) Av - A m v 

k=l k=l \ 

= (M-A m )m (A-A fc J)jv 

/m-2 \ 

= (M-A m )l J] (A-A fc /)J(Av-A m _iv) 

/m-2 \ 

= (/i - A m ) (/i - A m _ x ) (JJ (A - A fc /) I 
continuing this way yields 

m 

= n (/ i_A ^) v ' 

/c=i 

a contradiction unless (i — Xk for some fc. ■ 

Therefore, these are eigenvectors and eigenvalues with the usual meaning. This leads to the 
following definition. 

Definition 17.3.9 For A G C (V, V) where dim (V) = n, £/ie scalars, A& m £/ie minimal polynomial, 



p(A)=n(A-A fc )^n( A - A 






are ca//ed £/ie eigenvalues of A. In the last expression, A& zs a repeated root which occurs r^ times. 
The collection of eigenvalues of A is denoted by a (A) . The generalized eigenspaces are 

kev(A-X k I) rk =V k . 



Theorem 17.3.10 In the situation of the above definition, 

V = Vi © • • • © v p 

That is, the vector space equals the direct sum of its generalized eigenspaces. 

Proof: Since V = ker (IIfc=i {A ~ ^kl) rk ) ? the conclusion follows from Theorem 17.3.5. ■ 

17.4 Block Diagonal Matrices 

In this section the vector space will be C n and the linear transformations will be those which result 
by multiplication by n x n matrices. 
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Definition 17.4.1 Let A and B be two n x n matrices. Then A is similar to B, written as A ~ B 
when there exists an invertible matrix S such that A = S~ 1 BS. 

Theorem 17.4.2 Let A be an n x n matrix. Letting Ai,A2,--- , A r be the distinct eigenvalues of 
A, arranged in some order, there exist square matrices Pi, — - ,P r such that A is similar to the block 
diagonal matrix 

( Pi -- 

p-\ : : 

V o ••• P r 

in which P& has the single eigenvalue A&. Denoting by r^ the size of P^ it follows that r^ equals the 
dimension of the generalized eigenspace for A&. Furthermore, if S is the matrix satisfying 

S" 1 AS = P, 

then S is of the form 

{B 1 ■■■ B r ) 

where B^ = ( u^ • • • u^ k ) in which the columns, {u^, • • • , u^ k } = D^ constitute a basis for V\ k . 
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Proof: By Theorem 17.3.9 and Lemma 17.3.3, 

c n = v Xl e • • • e v Xk 

and a basis for C™ is {D\, ■ ■ ■ ,D r } where D^ is a basis for V\ k , ker (^4 — XkI) Tk ■ 
Let 

S=(B 1 ■■■ B r ) 

where the B{ are the matrices described in the statement of the theorem. Then S~ r must be of the 
form 

s ~ 1= ; 
V c r 

where C^ = I ri xn- Also, if i ^ j, then CiABj = the last claim holding because A : V\ j m- V\. 
so the columns of ABj are linear combinations of the columns of Bj and each of these columns is 



orthogonal to the rows of d since CiBj = if i ^ j. Therefore, 


( Cl \ 


S _1 AS = : \A(B 1 ■■■ B r ) 


\C r ) 


l Cl \ 


\ \{AB 1 ■■■ AB r ) 


\C r ) 


( C^AB X \ 




C 2 AB 2 ■■■ 




: '•• 




\ ••• C r AB r J 



and C rk AB rk is an r^ x r^ matrix. 

What about the eigenvalues of C rk AB rk ? The only eigenvalue of A restricted to V\ k is A& because 
if Ax = /ix for some x G V\ fc and /i/Afc, then 

(4-Vf'x = (A-^J+^-A^/rx 

j=o V 7 / 

= (/i-A fe ) rfc x^O 

contrary to the assumption that x E V\ k . Suppose then that C rk AB rk x = Ax where x/0. Why is 
A = A fe ? Let y = B rk x so y e V\ k . Then 



/0\ / 



S~ l Ay = S^AS 



x 







C rk AB rk x. 



\ /°\ 



and so 



V o/ \ o 

/ o \ 



X 



W 



Ay = XS 



= Ay. 



\0j 
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Therefore, A = A& because, as noted above, A& is the only eigenvalue of A restricted to V\ k . Now 
let Pk = C rk AB rh . ■ 

The above theorem contains a result which is of sufficient importance to state as a corollary. 



Corollary 17.4.3 Let A be an nxn matrix and let D^ denote a basis for the generalized eigenspace 
for A/c. Then {Di, • • • , D r } is a basis for C n . 

More can be said. Recall Theorem 13.2.11 on Page 338. From this theorem, there exist unitary 
matrices, Uk such that U^PkUk = T^ where Tj. is an upper triangular matrix of the form 



A/c 



\ 



• • • x k J 

Now let U be the block diagonal matrix defined by 



U = 



V o 



By Theorem 17.4.2 there exists S such that 

S^AS = 



Pi 




Therefore, 



U*SASU 



ut ■■■ 
••• 



o \ 

u; J 












o ••• u;p r u r J \ o ... T r 

This proves most of the following corollary of Theorem 17.4.2. 

Corollary 17.4.4 Let A be an n x n matrix. Then A is similar to an upper triangular, block 
diagonal matrix of the form 

Ti • • • 

• • • T r 

where Tj. is an upper triangular matrix having only A& on the main diagonal. The diagonal blocks 
can be arranged in any order desired. If T& is an m^ x rrik matrix, then 

ruk = dim (ker (A — \kl) rk ) 

where the minimal polynomial of A is 

ri(A-A fc r 

fc=i 

Furthermore, m^ is the multiplicity of Xk as a zero of the characteristic polynomial of A. 
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Proof: The only thing which remains is the assertion that rrtk equals the multiplicity of A& as 
a zero of the characteristic polynomial. However, this is clear from the observation that since T is 
similar to A they have the same characteristic polynomial because 

det (A - XI) = det (S (T - XI) S' 1 ) 

= det (S) det (5 _1 ) det (T - XI) 

= det (SS' 1 ) det (T - XI) 

= det (T - XI) 

and the observation that since T is upper triangular, the characteristic polynomial of T is of the 
form 



W^k-\y 



k=i 



The above corollary has tremendous significance especially if it is pushed even further resulting 
in the Jordan Canonical form. This form involves still more similarity transformations resulting in 
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an especially revealing and simple form for each of the Tk, but the result of the above corollary is 
sufficient for most applications. 

It is significant because it enables one to obtain great understanding of powers of A by using the 
matrix T. From Corollary 17 A A there exists an n x n matrix S 2 such that 

A = S^TS. 

Therefore, A 2 = S^TSS^TS = S'^S and continuing this way, it follows 

where T is given in the above corollary. Consider T h . By block multiplication, 

/ T} \ 

T k = 

\ T k ) 

The matrix T s is an m s x m s matrix which is of the form 

a • • • * \ 

: •.. : (17.4) 

••• a J 

which can be written in the form 

T s = D + TV 

for D a multiple of the identity and N an upper triangular matrix with zeros down the main diagonal. 
Therefore, by the Cayley Hamilton theorem, N ms = because the characteristic equation for N is 
just A ms = 0. Such a transformation is called nilpotent. You can see 7V ms = directly also, without 
having to use the Cayley Hamilton theorem. Now since D is just a multiple of the identity, it follows 
that DN = ND. Therefore, the usual binomial theorem may be applied and this yields the following 
equations for k > m s . 

k 



T* = (D + N) k = ^2f k ^jD k ^N^ 



= E ( )D k - j Ni : (17.5) 

3=0 \ J S 

the third equation holding because N ms = 0. Thus T s fe is of the form 

(a k • • • * 
: ■■. ; 
• • • a k 

Lemma 17.4.5 Suppose T is of the form T s described above in 17.4 where the constant a, on the 
main diagonal, is less than one in absolute value. Then 



lim (T k ) . . = 0. 



2 The S here is written as S 1 in the corollary. 
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Proof: From 17.5, it follows that for large &, and j < m s , 



k\ k{k- l)---(fe-m a + l) 



m s l 



Therefore, letting C be the largest value of 



( N r 



pq 



for < j < m s 



< rriaC : \a\ 



(T k ) 



which converges to zero as k — >> oo. This is most easily seen by applying the ratio test to the series 



k=m s 



k(k- l)---(fc-m a + l) \ k . ms 



and then noting that if a series converges, then the k term converges to zero. 



17.5 The Matrix Of A Linear Transformation 

If V is an n dimensional vector space and {vi, • • • , v n } is a basis for V, there exists a linear map 

q:¥ n ^V 
defined as 

n 

q (a) = ^ ctiVi 
where 



2 = 1 



a — y ca&ii 

i=l 

for e^ the standard basis vectors for F n consisting of 

/°\ 

i 



V o y 

where the one is in the i th slot. It is clear that q defined in this way, is one to one, onto, and 
linear. For v G V, q~ l (v) is a list of scalars called the components of v with respect to the basis 
{vi,--- ,v n }. 

Definition 17.5.1 Given a linear transformation L, mapping V to W, where 

{vi,-" ,v n } 

is a basis of V and {wi, • • • , w m } is a basis for W, an m x n matrix A = (aij)is called the matrix 
of the transformation L with respect to the given choice of bases for V and W , if whenever v G V, 
then multiplication of the components of 'v by (a^) yields the components of Lv. 
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The following diagram is descriptive of the definition. Here qy and qw are the maps defined 
above with reference to the bases, {vi, • • • , v n } and {wi, • • • , w m } respectively. 

L 

{vi,---,v n } V -> W {wi,---,w m } 

Qvt ° tQw (17.6) 

A 

Letting b E F n , this requires 

Y a iJ b J w i = L Y h J Y i = Y b i Lv i' 

hj 3 3 

Now 

Lv 3 = Y Ci i™ i ( 17 - 7 ) 

i 

for some choice of scalars Cij because {wi, • • • , w m } is a basis for W. Hence 

Y CLijbjWi = ^ bj Y C i3 W i = Y, C i3 b 3 W i- 
hj 3 i h3 

It follows from the linear independence of {wi, • • • , w m } that 

3 3 

for any choice of b G F n and consequently 

Qjij Cij 

where c^ is defined by 17.7. It may help to write 17.7 in the form 

( Lvi • • • Lv n ) = ( wi • • • w m ) C = ( wi • • • w m ) A (17.8) 

where C = (qj) , A = (a^) . 

Example 17.5.2 Let 

V = { polynomials of degree 3 or less}, 

W = { polynomials of degree 2 or less}, 

and L = D where D is the differentiation operator. A basis for V is {l,x, x 2 , x 3 } and a basis for 
W is {l,x, x 2 }. 

What is the matrix of this linear transformation with respect to this basis? Using 17.8, 

(0 1 2x 3x 2 ) = ( 1 x x 2 )C. 

It follows from this that 

/ 1 

C= 2 

\ 3 

Now consider the important case where V = F n , W = F m , and the basis chosen is the standard 
basis of vectors e^ described above. Let L be a linear transformation from F n to F m and let A be 
the matrix of the transformation with respect to these bases. In this case the coordinate maps qy 



Download free eBooks at bookboon.com 

490 



Elementary Linear Algebra 



Linear Transformations 



and qw are simply the identity map and the requirement that A is the matrix of the transformation 
amounts to 

TTi (Lb) = TTi (Ah) 

where ixi denotes the map which takes a vector in F m and returns the i th entry in the vector, the 
jth component of the vector with respect to the standard basis vectors. Thus, if the components of 
the vector in F n with respect to the standard basis are (&i, • • • , b n ) , 

) =^2biei, 



b n 



then 



b = ( br • 

^ (Lb) = (Lb). = ^a ij 6 j . 



What about the situation where different pairs of bases are chosen for V and W? How are the 
two matrices with respect to these choices related? Consider the following diagram which illustrates 
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the situation. 



F n 


h 


F m 


<22 I 


o 


P2 I 


V 


k 


w 


git 


o 


Pl1- 


F n 


M 


w m 



In this diagram qi and p^ are coordinate maps as described above. From the diagram, 

Pi 1 P2A 2 q2 1 qi =Ai, 

where <7^" 1 ^i and V\ X V2 are one to one, onto, and linear maps. Thus the effect of these maps is 
identical to multiplication by a suitable matrix. 

Definition 17.5.3 In the special case where V = W and only one basis is used for V = W, this 
becomes 

Qi 1 q2A 2 q2 1 qi =Ai. 

Letting S be the matrix of the linear transformation q 2 X q\ with respect to the standard basis vectors 

in ¥ n , 

S~ 1 A 2 S = A 1 . (17.9) 

When this occurs, A\ is said to be similar to A 2 and A ^r S~ x AS is called a similarity transforma- 
tion. 

Here is some terminology. 

Definition 17.5.4 Let S be a set. The symbol, ~ is called an equivalence relation on S if it satisfies 
the following axioms. 

1. x ~ x for all x G S. (Reflexive) 

2. If x ~ y then y ~ x. (Symmetric) 

3. If x ~ y and y ~ z, then x ~ z. (Transitive) 

Definition 17.5.5 [x] denotes the set of all elements of S which are equivalent to x and [x] is called 
the equivalence class determined by x or just the equivalence class of x. 

With the above definition one can prove the following simple theorem which you should do if 
you have not seen it. 

Theorem 17.5.6 Let ~ be an equivalence class defined on a set, S and let H denote the set of 
equivalence classes. Then if [x] and [y] are two of these equivalence classes, either x ~ y and 
[x] = [y] or it is not true that x ~ y and [x] C\[y] =0. 

Theorem 17.5.7 In the vector space of n x n matrices, define 

A~ B 

if there exists an invertible matrix S such that 

A = S^BS. 

Then ~ is an equivalence relation and A ~ B if and only if whenever V is an n dimensional vector 
space, there exists L G £ (V, V) and bases {vi, • • • , v n } and {wi, • • • , w n } such that A is the matrix 
of L with respect to {vi, • • • , v n } and B is the matrix of L with respect to {wi, • • • , w n }. 
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Proof: A ~ A because S = I works in the definition. If A ~ B , then B ~ A, because 

A = S~ 1 BS 

implies 

B = SAS~ 1 . 

If A ~ B and B ~ C, then 

A = S^BS, B = T~ X CT 

and so 

A = S^T^CTS = (TSy 1 CTS 

which implies A ~ C. This verifies the first part of the conclusion. 

Now let V be an n dimensional vector space, A ~ B and pick a basis for V, 

{vi,--- ,v n }. 

Define Le£(V, V) by 



Lv i E Ew 



3 



where A = (a^) . Then if B = (6^) , and 5 = (s^-) is the matrix which provides the similarity 
transformation, 

A = S^BS, 

between A and £?, it follows that 

Lvi = J2 s irWs (s' 1 ) Vj- (17.10) 



Now define 



Then from 17.10, 






i,j,r,s 



and so 



£w fe = ^6 fcs w s . 



This proves the theorem because the if part of the conclusion was established earlier. ■ 

What if the linear transformation consists of multiplication by a matrix A and you want to find 
the matrix of this linear transformation with respect to another basis? Is there an easy way to do 
it? The answer is yes. 

Proposition 17.5.8 Let A be an m x n matrix and let L be the linear transformation which is 
defined by 

/ n \ n m n 

L I ^ X k e k J = ^2 (^ efe ) Xk - ^2 ^2 A ^ x k^i 
\k=l ) k=l i=l k=l 

In simple language, to find Lx, you multiply on the left o/x by A. Then the matrix M of this linear 
transformation with respect to the bases {ui, • • • , u n } for ¥ n and {wi, • • • , w m } for ¥ m is given by 

M = ( wi • • • w m ) A ( ui • • • u n ) 

where ( wi • • • w m ) is the m x m matrix which has Wj as its j th column. 
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Proof: Consider the following diagram. 



L 



{Ui,...,Un} W n -+ ¥ m {Wi,.-.,w m } 

Qvt ° tQw 
M 
Here the coordinate maps are defined in the usual way. Thus 

n 

q v ( xi ••• x n ) = y^XjUj. 

Therefore, qy can be considered the same as multiplication of a vector in F n on the left by the 
matrix 

( ui • • • u n ) . 

Similar considerations apply to qw- Thus it is desired to have the following for an arbitrary x G F n . 

A ( ui • • • u n ) x = ( wi • • • w n ) Mx 

Therefore, the conclusion of the proposition follows. ■ 

Definition 17.5.9 An n x n matrix A, is diagonalizable if there exists an invertible n x n matrix 
S such that S~ 1 AS = D, where D is a diagonal matrix. Thus D has zero entries everywhere except 
on the main diagonal. Write diag (Ai • • • , A n ) to denote the diagonal matrix having the Xi down the 
main diagonal. 

Which matrices are diagonalizable? 

Theorem 17.5.10 Let A be an nxn matrix. Then A is diagonalizable if and only if¥ n has a basis 
of eigenvectors of A. In this case, S of Definition 17.5.9 consists of the nxn matrix whose columns 
are the eigenvectors of A and D = diag (Ai, • • • , A n ) . 

Proof: Suppose first that F n has a basis of eigenvectors, {vi, • • • , v n } where Avi = A^v^. Then 



let S denote the matrix (vi • • • v n ) and let S 



where ufvj = S id = { J ! f \ ^ J . .S x 



{i 



exists because S has rank n. Then from block multiplication, 



i? 



S- X AS = 



(Avx-'-AVn) 



u" 




D. 



X n J 
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Next suppose A is diagonalizable so S~ 1 AS = D = diag (Ai, • • • , A n ) . Then the columns of S 
form a basis because 5 -1 is given to exist. It only remains to verify that these columns of A are 
eigenvectors. But letting S = (vi • • • v n ) , AS = SD and so {Av\ • • • Av n ) = (AiVi • • • A n v n ) which 
shows that Avi — A^v^. ■ 

It makes sense to speak of the determinant of a linear transformation as described in the following 



corollary. 

Corollary 17.5.11 Let L G C(V,V) where V is an n dimensional vector space and let A be the 

matrix of this linear transformation with respect to a basis on V. Then it is possible to define 



det (L) = det (A) . 

Proof: Each choice of basis for V determines a matrix for L with respect to the basis. If A and 
B are two such matrices, it follows from Theorem 17.5.7 that 



and so 



A = S~ 1 BS 

det (A) = det (S' 1 ) det (B) det (S) . 
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But 

1 = det (I) = det {S^S) = det (S) det (S' 1 ) 

and so 

det (A) = det (B) ■ 

Definition 17.5.12 Let A £ C (X,Y) where X and Y are finite dimensional vector spaces. Define 
rank (A) to equal the dimension of A (X) . 

The following theorem explains how the rank of A is related to the rank of the matrix of A. 

Theorem 17.5.13 Let A £ C (X, Y). Then rank (A) = rank (M) where M is the matrix of A taken 
with respect to a pair of bases for the vector spaces X, and Y. 



Proof: Recall the diagram which describes what is meant by the matrix of A. Here the two bases 
are as indicated. 

{vi,-",v n } X A^ Y {wi,'",w m } 

qxt ° tQY 

F n M F m 

Let {zi, • • • ,z r } be a basis for A (X) . Then since the linear transformation, qy is one to one and 
onto, {q^zi, ■ ■ ■ ,qy X z r } is a linearly independent set of vectors in F m . Let Au\ = Z{. Then 

Mq^Ui = qy X Zi 

and so the dimension of M (F n ) > r. Now if M (F n ) < r then there exists 

y £ M (F n ) \ span {q^ - Zl , • • • , q Y ^ z r ) . 

But then there exists x £ F n with Mx = y. Hence 

y = Mx = q^ 1 Aq x * £ span {q^zi, • • • , qy 1 z r } 

a contradiction. ■ 

The following result is a summary of many concepts. 

Theorem 17.5.14 Let L £ C(V, V) where V is a finite dimensional vector space. Then the follow- 
ing are equivalent. 

1. L is one to one. 

2. L maps a basis to a basis. 

3. L is onto. 

4. det (L) + 

5. IfLv = then v = 0. 

Proof: Suppose first L is one to one and let {vi}™ =1 be a basis. Then if Y^i=i c i^ v i = it 
follows L (Y^i=i c i v i) — which means that since L (0) = 0, and L is one to one, it must be the case 
that J27=i c * v * = 0- Since {v^} is a basis, each q = which shows {Lvi} is a linearly independent 
set. Since there are n of these, it must be that this is a basis. 

Now suppose 2.). Then letting {v^} be a basis, and y £ V, it follows from part 2.) that there are 
constants, {c{\ such that y = Yl7=i c i^ v i = L (SlLi c i v i) • Thus L is onto. It has been shown that 
2.) implies 3.). 
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Now suppose 3.). Then the operation consisting of multiplication by the matrix of L, Ml, must 
be onto. However, the vectors in F n so obtained, consist of linear combinations of the columns of 
Ml- Therefore, the column rank of Ml is n. By Theorem 8.5.7 this equals the determinant rank 
and so det (M L ) = det (L) ^ 0. 

Now assume 4.) If Lv = for some v ^ 0, it follows that M^x = for some x/0. Therefore, 
the columns of Ml are linearly dependent and so by Theorem 8.5.7, det (Ml) = det (L) = contrary 
to 4.). Therefore, 4.) implies 5.). 

Now suppose 5.) and suppose Lv = Lw. Then L (v — w) = and so by 5.), v — w = showing 
that L is one to one. ■ 

Also it is important to note that composition of linear transformation corresponds to multipli- 
cation of the matrices. Consider the following diagram. 



x 4 


Y 


5 


Z 


fxt o 


toy 


o 


tqz 


W n Ma 


W m 


M^ 


¥ p 



where A and B are two linear transformations, A £ C (X, Y) and B £ C(Y,Z). Then B o A £ 
C (X, Z) and so it has a matrix with respect to bases given on X and Z, the coordinate maps for 
these bases being qx and qz respectively. Then 

B o A = qzMsq^QyMAq- 1 = q z M B M A q-\ 

But this shows that Mb Ma plays the role of MboA-, the matrix of B o A. Hence the matrix of B o A 
equals the product of the two matrices Ma and Mb • Of course it is interesting to note that although 
MboA must be unique, the matrices, Mb and Ma are not unique, depending on the basis chosen 
forF. 

Theorem 17.5.15 The matrix of the composition of linear transformations equals the product of 
the matrices of these linear transformations. 

17.5.1 Some Geometrically Defined Linear Transformations 

This is a review of earlier material. If T is any linear transformation which maps F n to F m , there is 
always anmxn matrix A with the property that 

Ax = Tx (17.11) 

for all x £ F n . How does this relate to what is discussed above? In terms of the above diagram, 

{ei,...,en} ¥ n T F™ { ei ,---,e n } 



,e„} F" 


5 


¥ m 


3F« t 


O 


t«r 


F n 


H 


F m 



where 

n 

q W n (x) = ^ x i e i = x - 

Thus those two maps are really just the identity map. Thus, to find the matrix of the linear 
transformation T with respect to the standard basis vectors, 

Te k = Me k 

In other words, the k th column of M equals Te k as noted earlier. All the earlier considerations 
apply. These considerations were just a specialization to the case of the standard basis vectors of 
this more general notion which was just presented. 
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17.5.2 Rotations About A Given Vector 

As an application, I will consider the problem of rotating counter clockwise about a given unit 
vector which is possibly not one of the unit vectors in coordinate directions. First consider a pair of 
perpendicular unit vectors, u x and 112 and the problem of rotating in the counterclockwise direction 
about 113 where 113 = ui x 112 so that ui , 112 , 113 forms a right handed orthogonal coordinate system. 
Thus the vector u 3 is coming out of the page. 




Let T denote the desired rotation. Then 

T (aui + bu 2 + cu 3 ) = aTui + bTu 2 + cTu 3 

= (a cos — b sin 0) ui + (a sin + b cos 0) 112 + CU3. 
Thus in terms of the basis {ui, 112, 113} , the matrix of this transformation is 




I want to write this transformation in terms of the usual basis vectors, {ei, e2, e3}. From Proposition 
17.5.8, if A is this matrix, 




= (ui 



u 2 u 3 



A(\li u 2 u 3 ) 



and so you can solve for A if you know the u^ . 

Suppose the unit vector about which the counterclockwise rotation takes place is (a,b,c). Then 
I obtain vectors, u x and 112 such that {111,112,113} is a right handed orthogonal system with 113 = 
(a, 6, c) and then use the above result. It is of course somewhat arbitrary how this is accomplished. 
I will assume, however that \c\ 7^ 1 since otherwise you are looking at either clockwise or counter 
clockwise rotation about the positive z axis and this is a problem which has been dealt with earlier. (If 
c = —1, it amounts to clockwise rotation about the positive z axis while if c = 1, it is counterclockwise 
rotation about the positive z axis.) Then let 113 = (a,b,c) and 112 = , I , 2 (6, — a, 0) . This one is 
perpendicular to 113. If {ui, 112, 113} is to be a right hand system it is necessary to have 



Ui = u 2 x u 3 



,/(a 2 + & 2 )(a 2 + 6 2 +c 2 ) 



(—ac, —be, a 2 + 6 2 ) 
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Now recall that 113 is a unit vector and so the above equals 

1 



vV + & 2 ) 



(—ac, —be, a + b ) 



Then from the above, A is given by 

/ -ac b \ 

— be — a l 

yV+6 2 ) V^W 

\ Va 2 + 6 2 c / 



cos — sin 

sin cos 

1 



/ — ac b 

' VV+6 2 ) 7^+^ 

— be — a 

yj(a 2 +b 2 ) Va 2 +b' 2 

\ Va 2 + 6 2 






Of course the matrix is an orthogonal matrix so it is easy to take the inverse by simply taking the 
transpose. Then doing the computation and then some simplification yields 



(l — a 2 )cos0 ab (1 — cos 6) — csin 6 ac (1 — cos 0) + b sin 



ab(l -cos 6) + csin0 b 2 + (l - b 2 ^ 
ac (1 — cos 0) — b sin be (1 — cos 0) 



cos 6 be (1 — cos 0) — a sin t 
a sin c 2 + (l - c 2 ) cos 



(17.12) 




www.im^rith-zf 
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With this, it is clear how to rotate clockwise about the unit vector (a, 6, c) . Just rotate counter 
clockwise through an angle of —0. Thus the matrix for this clockwise rotation is just 

a 2 + (l — a 2 ) cos 6 ab (1 — cos 6) + c sin ac (1 — cos 0) — b sin # 
afr (1 — cos 0) — c sin 6 2 + (l - b 2 ) cos 6c (1 - cos 0) + a sin 
ac (1 — cos 6) + b sin be (1 — cos 0) — a sin c 2 + (l — c 2 ) cos 

In deriving 17.12 it was assumed that c^±l but even in this case, it gives the correct answer. 
Suppose for example that c = 1 so you are rotating in the counter clockwise direction about the 
positive z axis. Then a, b are both equal to zero and 17.12 reduces to the correct matrix for rotation 
about the positive z axis. 

17.6 Exercises 

1. Find the matrix with respect to the standard basis vectors for the linear transformation which 
rotates every vector in R 2 through an angle of tt/3. 

2. Find the matrix with respect to the standard basis vectors for the linear transformation which 
rotates every vector in R 2 through an angle of 7r/4. 

3. Find the matrix with respect to the standard basis vectors for the linear transformation which 
rotates every vector in R 2 through an angle of ir/12. Hint: Note that tt/12 = tt/3 — 7r/4. 

4. Find the matrix with respect to the standard basis vectors for the linear transformation which 
rotates every vector in R 2 through an angle of 2tt/3 and then reflects across the x axis. 

5. Find the matrix with respect to the standard basis vectors for the linear transformation which 
rotates every vector in R 2 through an angle of tt/3 and then reflects across the y axis. 

6. Find the matrix with respect to the standard basis vectors for the linear transformation which 
rotates every vector in R 2 through an angle of 5tt/12. Hint: Note that 57r/12 = 2i\j3 — 7r/4. 

7. Let V be an inner product space and u/0. Show that the function T u defined by T u (v) = 
v — proj u (v) is also a linear transformation. Here 

pro Ju (v) = —-3-11 

M 

Now show directly from the axioms of the inner product that 

(T u v,u)=0 

8. Let V be a finite dimensional inner product space, the field of scalars equal to either R or C. 
Verify that / given by /v = (v,z) is in £(V,F). Next suppose / is an arbitrary element of 
C (V, F). Show the following. 

(a) If / = 0, the zero mapping, then /v = (v, 0) for all v G V. 

(b) If / 7^ then there exists z/0 satisfying (u, z) = for all u G ker (/) . 

(c) Explain why / (y) z - / (z) y G ker (/). 

(d) Use part b. to show that there exists w such that / (y) = (y, w) for all y G V. 

(e) Show there is at most one such w. 

You have now proved the Riesz representation theorem which states that every / G C (V, F) is 
of the form 

/(y) = (y,w) 

for a unique w G V. 
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9. tLet A E C(V, W) where V, W are two finite dimensional inner product spaces, both having 
field of scalars equal to F which is either RorC. Let / G C (V, F) be given by 

/(y) = (^y,z> 

where () now refers to the inner product in W. Use the above problem to verify that there 
exists a unique w G V such that / (y) = (y, w) , the inner product here being the one on V. 
Let A*z = w. Show that A* G C (W, V) and by construction, 

(Ay,z) = (y,A*z). 

In the case that V = ¥ n and W = F m and A consists of multiplication on the left by an m x n 
matrix, give a description of A* . 

10. Let A be the linear transformation defined on the vector space of smooth functions (Those 
which have all derivatives) given by Af = D 2 + 2D + 1. Find ker (A). Hint: First solve 
(D + l)z = 0. Then solve (D + l)y = z. 

11. Let A be the linear transformation defined on the vector space of smooth functions (Those 
which have all derivatives) given by Af = D 2 + 5D + 4. Find ker (A). Note that you could first 
find ker (D + 4) where D is the differentiation operator and then consider ker {D + 1) [D + 4) = 
ker (A) and consider Sylvester's theorem. 

12. Suppose Ax = b has a solution where A is a linear transformation. Explain why the solution 
is unique precisely when Ax = has only the trivial (zero) solution. 

13. Verify the linear transformation determined by the matrix 

10 2 
1 4 

maps R 3 onto R 2 but the linear transformation determined by this matrix is not one to one. 

14. Let L be the linear transformation taking polynomials of degree at most three to polynomials 
of degree at most three given by 

D 2 + 2D + 1 

where D is the differentiation operator. Find the matrix of this linear transformation relative 
to the basis {l,x,x 2 ,x 3 } . Find the matrix directly and then find the matrix with respect to 
the differential operator D + 1 and multiply this matrix by itself. You should get the same 
thing. Why? 

15. Let L be the linear transformation taking polynomials of degree at most three to polynomials 
of degree at most three given by 

D 2 + 5L> + 4 

where D is the differentiation operator. Find the matrix of this linear transformation relative 
to the bases {l,x,x 2 ,x 3 } . Find the matrix directly and then find the matrices with respect 
to the differential operators D + l,D + 4 and multiply these two matrices. You should get the 
same thing. Why? 

16. Show that if L G C (V, W) (linear transformation) where V and W are vector spaces, then if 
Ly p = f for some y p G V, then the general solution of Ly = f is of the form 

ker(L) +y p . 
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17. Let L E C(V, W) where V, W are vector spaces, finite or infinite dimensional, and define x ~ y 
if x — y G ker (L) . Show that ~ is an equivalence relation. Next define addition and scalar 
multiplication on the space of equivalence classes as follows. 



[x] + [y] 

a [x] 



[x + y] 

[ax] 



Show that these are well defined definitions and that the set of equivalence classes is a vector 
space with respect to these operations. The zero is [kerL] . Denote the resulting vector space 
by V/ ker (L) . Now suppose L is onto W. Define a mapping A : V/ ker (K) \-> W as follows. 

A [x] = Lx 

Show that A is well defined, one to one and onto. 

18. If V is a finite dimensional vector space and L G C (V, V) , show that the minimal polynomial 
for L equals the minimal polynomial of A where A is the n x n matrix of L with respect to 
some basis. 
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19. Let A be an n x n matrix. Describe a fairly simple method based on row operations for 
computing the minimal polynomial of A. Recall, that this is a monic polynomial p (A) such 
that p (A) = and it has smallest degree of all such monic polynomials. Hint: Consider 
/, A 2 , • • • . Regard each as a vector in F n and consider taking the row reduced echelon form 
or something like this. You might also use the Cayley Hamilton theorem to note that you can 
stop the above sequence at A n . 

20. Let A be an n x n matrix which is non defective. That is, there exists a basis of eigenvectors. 
Show that if p(X) is the minimal polynomial, then p(X) has no repeated roots. Hint: First 
show that the minimal polynomial of A is the same as the minimal polynomial of the diagonal 
matrix 

Z^(Ai) \ 

V D(X r ) J 

Where D (A) is a diagonal matrix having A down the main diagonal and in the above, the A^ 
are distinct. Show that the minimal polynomial is ni=i (^ — ^) • 
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21. Show that if A is an n x n matrix and the minimal polynomial has no repeated roots, then A is 
non defective and there exists a basis of eigenvectors. Thus, from the above problem, a matrix 
may be diagonalized if and only if its minimal polynomial has no repeated roots. (It turns out 
this condition is something which is relatively easy to determine. You look at the polynomial 
and its derivative and ask whether these are relatively prime. The answer to this question can 
be determined using routine algorithms as discussed above in the section on polynomials and 
fields. Thus it is possible to determine whether annxn matrix is defective.) Hint: You might 
want to use Theorem 17.3.1. 
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The Jordan Canonical Form* 




Recall Corollary 17.4.4. For convenience, this corollary is stated below. 

Corollary A. 0.1 Let A be annxn matrix. Then A is similar to an upper triangular, block diagonal 
matrix of the form 

" Ti • • • \ 







T r j 



where T^ is an upper triangular matrix having only A& on the main diagonal. The diagonal blocks 
can be arranged in any order desired. If Tj~ is an m^ x m^ matrix, then 

rrik = dimker (A — \kl) rh • 

where the minimal polynomial of A is 

f[(A-A fe r 

fc=i 

The Jordan Canonical form involves a further reduction in which the upper triangular matrices, 
Tk assume a particularly revealing and simple form. 

Definition A. 0.2 J^ (a) is a Jordan block if it is a k x k matrix of the form 

( a 1 \ 

'•. '•. 



Jk{oi) 



\ o 



a J 



In words, there is an unbroken string of ones down the super diagonal and the number, a filling 
every space on the main diagonal with zeros everywhere else. A matrix is strictly upper triangular 
if it is of the form 

/ * * 



\o 



where there are zeroes on the main diagonal and below the main diagonal. 
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The Jordan canonical form involves each of the upper triangular matrices in the conclusion of 
Corollary 17.4.4 being a block diagonal matrix with the blocks being Jordan blocks in which the size 
of the blocks decreases from the upper left to the lower right. The idea is to show that every square 
matrix is similar to a unique such matrix which is in Jordan canonical form. It is assumed here 
that the field of scalars is C but everything which will be done below works just fine if the minimal 
polynomial can be completely factored into linear factors in the field of scalars. 

Note that in the conclusion of Corollary 17.4.4 each of the triangular matrices is of the form 
al + TV where A is a strictly upper triangular matrix. The existence of the Jordan canonical form 
follows quickly from the following lemma. 

Lemma A. 0.3 Let A be an n x n matrix which is strictly upper triangular. Then there exists an 
invertible matrix S such that 

( J ri (0) o \ 

J r2 (0) 

S~ 1 NS = 

V J re (0) J 

where n > r 2 > • • • > r s > 1 and J2t=i r i = n - 

Proof: First note the only eigenvalue of TV is 0. Let Vi be an eigenvector. Then {vi, V2, • • • , v r } 
is called a chain if Nv^+i = v& for all k — 1, 2, • • • , r and vi is an eigenvector so Avi = 0. It will 
be called a maximal chain if there is no solution v, to the equation, Av = v r . 

Claim 1: The vectors in any chain are linearly independent and for 

{vi,v 2 ,-" ,v r } 

a chain based on vi, 

A : span(vi,v 2 ,-" ,v r ) \-+ span (vi, v 2 , • • • ,v r ). (1.1) 

Also if {vi, v 2 , • • • , v r } is a chain, then r < n. 
Proof: First note that 1.1 is obvious because 



N ^2 CiVi = ^2 ° iVi 



i—\ i—2 

It only remains to verify the vectors of a chain are independent. Suppose then 



^CfeV/e =0. 



fc=l 



Do A^ r_1 to it to conclude c r = 0. Next do N r ~ 2 to it to conclude c r _i = and continue this way. 
Now it is obvious r < n because the chain is independent. This proves the claim. 

Consider the set of all chains based on eigenvectors. Since all have total length no larger than n 
it follows there exists one which has maximal length, {vj, • • • , v^} = B\. If span (Bi) contains all 
eigenvectors of A, then stop. Otherwise, consider all chains based on eigenvectors not in span(£>i) 
and pick one, B2 = {vf,--- ,v^ 2 } which is as long as possible. Thus r 2 < r\. If span (Bi : B2) 
contains all eigenvectors of A, stop. Otherwise, consider all chains based on eigenvectors not in 
span (£>i, £> 2 ) and pick one, B3 = {vj, • • • , v^ 3 } such that rs is as large as possible. Continue this 
way. Thus r k > r k+1 . 

Claim 2: The above process terminates with a finite list of chains 

{Bi, • • • ,B S } 
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because for any &, {£>i, • • • , B^} is linearly independent. 

Proof of Claim 2: The claim is true if k = 1. This follows from Claim 1. Suppose it is true for 
k — 1, k > 2. Then {£?i, • • • , I^-i} is linearly independent. Suppose 

p 
^2c q w q = 0, c q t^O 
g=i 

where the w g come from {£?i, • • • , £?/c-i, B^} . By induction, some of these w q must come from B^. 
Let vf be the one for which i is as large as possible. Then do 7V Z_1 to both sides to obtain vf , the 
eigenvector upon which the chain Bk is based, is a linear combination of {£?i, • • • , -Bfe-i} contrary to 
the construction. Since {£?i, • • • , B^} is linearly independent, the process terminates. This proves 
the claim. 

Claim 3: Suppose Nw = 0. (w is an eigenvector) Then there exist scalars, q such that 



i=l 

Recall that v^ is the eigenvector in the i th chain on which this chain is based. 

Proof of Claim 3: From the construction, w G span (B\, • • • ,B S ) since otherwise, it could serve 
as a base for another chain. Therefore, 



w =EE c ^ v fc- 



z=l fc=l 

Now apply N to both sides. 

i=l /c=2 

and so by Claim 2, c^ = if fc > 2. Therefore, 

s 
i=l 

and this proves the claim. 

It remains to verify that span (I?i, • • • ,B S ) = F n . Suppose w ^ span (B\, • • • , B s ) . By Claim 3 
this implies w is not an eigenvector since all the eigenvectors are in span (£?i, • • • , B 3 ) . Since N n = 0, 
there exists a smallest integer, k > 2 such that N k w = but N k_1 w ^ 0. Then k < min (7*1, • • • , r s ) 
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because there exists a chain of length k based on the eigenvector N k 1 w, namely 

7V /c_1 w,7V /c " 2 w,7V /e_3 w,--- ,w 

and this chain must be no longer than the preceding chains because of the construction in which 
a longest possible chain was chosen at each step. Since N k ~ 1 w is an eigenvector, it follows from 
Claim 3 that 



Therefore, 



and so, 



N k 


" 1 w = 




iN k ~W k . 




N k ~ 


x (w- 




= 




NN k 


~ 2 (w 


s N 


)- 
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which implies N k 2 (w— J^ =1 qv|.) is an eigenvector and so by Claim 3 there exist d{ such 
that 

N k ~ 2 ( w- e cvi) = e *vj = £ ** * _ M-i 

\ i=l / i=l z=l 

and so 

AT*- 2 ( w- £ civj - £ *vj_i J = 0. 

Continuing this way it follows that for each j < k, there exists a vector 



Zj e span(i?i, • • • ,B 3 ) 



such that 



N k ~ j (w-Zj) = 0. 
In particular, taking j = (fc — 1) yields 

7V(w- Z/c _i) = 

and now using Claim 3 again yields w 6 span (£?i, • • • , £? s ), a contradiction. Therefore, 

span^i,... ,B s )=W n 

after all and so {B\, • • • , £? s } is a basis for F n . 
Now consider the block matrix 



where 
Thus 



S=(B 1 ••• B a ) 

B k = (v k ••• v k r 






s- 1 



Ci 



c s 



where CiBi = I riXri and CiNBj = if i ^ j. Let 



Then 



Cfe 



/ < 



»r \ 



</ 



C fc ArB fc 



( JVvf ••• iVv. 






</ 



which equals an r^ x r& matrix of the form 



^ (0) 



/° 



Vo 



(0 v* ••• wl_, ) 



°\ 



'•• 1 
•• 0/ 
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That is, it has ones down the super diagonal and zeros everywhere else. It follows 

Cl \ 

: N(B 1 ■■■ B s ) 
C s J 
( J ri (0) 



S^NS 



J r2 (0) 



V o 



\ 



Jr 3 (0) / 



as claimed. ■ 

Now let the upper triangular matrices, T k be given in the conclusion of Corollary 17.4.4. Thus, 
as noted earlier, 

Tk = \klr k xr k + Nk 

where N k is a strictly upper triangular matrix of the sort just discussed in Lemma A. 0.3. Therefore, 
there exists S k such that S^ 1 N k S k is of the form given in Lemma A. 0.3. Now S^ 1 \kI rk xr k Sk — 
X k I rk xr k and so S^ 1 T k S k is of the form 



/ Jii (A fc ) 



J%2 (Afc) 



° \ 



V J is (A fc ) / 

where i\ > %2 > • • • > i s and J^=i h ~ r k- This proves the following corollary. 

Corollary A. 0.4 Suppose A is an upper triangular n x n matrix having a in every position on the 
main diagonal. Then there exists an invertible matrix S such that 

l Jk, (a) o \ 

Jk 2 (a) 
S^AS = 

\ J K (a) J 

where k\ > k2 > • • • > k r > 1 and J^I=i ^i = n - 

The next theorem is gives the existence of the Jordan canonical form. 

Theorem A. 0.5 Let A be an n x n matrix having eigenvalues Ai, • • • , A r where the multiplicity of 
Xi as a zero of the characteristic polynomial equals rrii. Then there exists an invertible matrix S such 
that 

J(Ai) 

S^AS 




where J (A&) is an m k x m k matrix of the form 

( Jk! (Afc) 

Jk 2 (Afe) 

V o 

where k\ > k^ > • • • > k r > 1 and Y^i=i ^ = m &- 




(1.2) 



\ 

Jk r (Afc) / 



(1.3) 
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Proof: From Corollary 17.4.4, there exists S such that S X AS is of the form 

Ti ... \ 







T r ) 



where T& is an upper triangular m^ x rrik matrix having only A& on the main diagonal. By Corollary 
A. 0.4 There exist matrices, Sk such that S^ 1 TkSk = J (A&) where J (\k) is described in 1.3. Now 
let M be the block diagonal matrix given by 



M 



Si 







\ 

S r ) 



It follows that M^S^ASM = M~ X TM and this is of the desired form. ■ 

What about the uniqueness of the Jordan canonical form? Obviously if you change the order of 
the eigenvalues, you get a different Jordan canonical form but it turns out that if the order of the 
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eigenvalues is the same, then the Jordan canonical form is unique. In fact, it is the same for any 
two similar matrices. 



Theorem A. 0.6 Let A and B be two similar matrices. Let J a and Jb be Jordan forms of A and 
B respectively, made up of the blocks J a (Xi) and Jb (Xi) respectively. Then J a and Jb are identical 
except possibly for the order of the J (Xi) where the Xi are defined above. 

Proof: First note that for A^ an eigenvalue, the matrices J a (Xi) and Jb (Xi) are both of size mi x 
rrii because the two matrices A and £?, being similar, have exactly the same characteristic equation 
and the size of a block equals the algebraic multiplicity of the eigenvalue as a zero of the characteristic 
equation. It is only necessary to worry about the number and size of the Jordan blocks making up 
J a (Xi) and Jb (Xi) . Let the eigenvalues of A and B be {Ai, • • • , A r } . Consider the two sequences of 
numbers {rank (A — XI) m } and {rank (B — XI) m }. Since A and B are similar, these two sequences 
coincide. (Why?) Also, for the same reason, {rank (J a — XI) m } coincides with {rank (Jb — XI) m } . 
Now pick Afc an eigenvalue and consider {rank (J a — A^/) 771 } and {rank ( Jb — Xkl) m } . Then 



/ J a (X\ — Afc) 



J A — X k I 







\ 



Ja(0) 



J a (X r — X k ) J 
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and a similar formula holds for Jb — A&I. Here 

/ J kl (0) 

J A (0) = 



and 



V o 

/ Jh (0) 



Jb(0) 



J k2 (0) 



Jh (o) 



^ 

Jk r (0) / 
\ 



V o j lp (o) J 

and it suffices to verify that U = ki for all i. As noted above, J2^i ~ J2h- Now from the above 
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formulas, 

rank(J^ — \kl) m — /] rrtj + rank ( J a (Q) m ) 

= ^m z +rank(J B (0) m ) 

i^k 

= rank(J 5 - A/e/) 771 , 
which shows rank (J a (0) m ) = rank ( J# (0) m ) for all m. However, 



/ Jw (0) r 



Jb (0) m = 



Ji 2 (or 



V 







•uor/ 



with a similar formula holding for J a (0) m and rank(J# (0) m ) = J^ =1 rank ( Jj. (0) m ) , similar for 
rank ( J a (0) m ) . In going from m to m + 1, 



rank(J^ (0) m ) - 1 = rank (X (0) m+1 ) 



until m = /i at which time there is no further change. Therefore, p = r since otherwise, there would 
exist a discrepancy right away in going from m = 1 to m — 2. Now suppose the sequence {k} is 
not equal to the sequence, {ki}. Then / r _5 =^ & r _5 for some b a nonnegative integer taken to be a 
small as possible. Say / r _5 > /c r _5. Then, letting m = /c r _5, 

r r 

X] rank (J h (0) m ) = £ rank ( J ki (0) m ) 

i=l i=l 

and in going to m + 1 a discrepancy must occur because the sum on the right will contribute less to 
the decrease in rank than the sum on the left. ■ 
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The Fundamental Theorem Of 
Algebra 



The fundamental theorem of algebra states that every non constant polynomial having coefficients 
in C has a zero in C. If C is replaced by R, this is not true because of the example, x 2 + 1 = 0. This 
theorem is a very remarkable result and notwithstanding its title, all the best proofs of it depend 
on either analysis or topology. It was first mostly proved by Gauss in 1797. The first complete 
proof was given by Argand in 1806. The proof given here follows Rudin [13]. See also Hardy [7] for 
a similar proof, more discussion and references. The best proof is found in the theory of complex 
analysis. Recall De Moivre's theorem from trigonometry which is listed here for convenience. 

Theorem B.0.7 Let r > be given. Then if n is a positive integer, 

[r (cos t + i sin t)] n = r n (cos nt + i sin nt) . 

Recall that this theorem is the basis for proving the following corollary from trigonometry, also 
listed here for convenience. 

Corollary B.0.8 Let z be a non zero complex number and let k be a positive integer. Then there 
are always exactly k k th roots of z in C. 

Lemma B.0.9 Let a^C for k = 1, • • • , n and let p (z) = Ylk=i a k zk - Then p is continuous. 

Proof: 

\az n - aw n \ < \a\ \z - w\ \z n ~' 1 + z n ~ 2 w + • • • + w 71 ' 1 \ . 

Then for \z — w\ < 1, the triangle inequality implies \w\ < 1 + \z\ and so if \z — w\ < 1, 

\az n -aw n \ < \a\\z-w\n(l + \z\) n . 

If e > is given, let 

5 < min ( 1, 

It follows from the above inequality that for \z — w\ < J, \az n — aw n \ < e. The function of the lemma 
is just the sum of functions of this sort and so it follows that it is also continuous. ■ 

Theorem B. 0.10 (Fundamental theorem of Algebra) Let p (z) be a nonconst ant polynomial. Then 
there exists zGC such that p(z) =0. 



Proof: Suppose not. Then 



p(z) = y^akZ k 



k=0 
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where a n ^ 0, n > 0. Then 

\p(z)\>\a n \\z\ n - n j2\a k \\z\ k 

fc=0 

and so 

lim \p(z)\ = oo. (2.1) 

\z\— >-oo 

Now let 

A = inf{|p(z)| :^C}. 

By 2.1, there exists an i? > such that if \z\ > R, it follows that \p(z)\ > A + 1. Therefore, 

A = inf{|p(*)| :zeC} = inf{|p(z)| : |*| < i?} . 

The set {z : |z| < R} is a closed and bounded set and so this infimum is achieved at some point w 
with \w\ < R. A contradiction is obtained if |p(w)| = so assume |p(w)| > 0. Then consider 

_ pO + w) 
p(w) 

It follows g (z) is of the form 

q(z) = l + c k z k + • • • + c n z n 

where c^ 7^ 0, because g (0) = 1. It is also true that \q(z)\ > 1 by the assumption that \p (w)\ is the 
smallest value of \p(z)\ . Now let E C be a complex number with |0| = 1 and 

0c k w k = -\w\ k \c k \. 
If 

— \w k \ Icfcl 

w^o,e = ! fc " fc| 

and if it; = 0, = 1 will work. Now let rj k = and let t be a small positive number. 

q (t V w) = l-t k \w\ k |c fc | + • • • + c„r (7710)" 

which is of the form 

l-t k \w\ k \c k \+t k (g(t,w)) 

where lim t _> 9 (t, w) = 0. Letting t be small enough, 

\g(t,w)\<\w\ k \c k \/2 
and so for such t, 



\q (tr]w)\ <l-t k \w\ k \c k \ + t k \w\ k \c k \ /2 < 1, 



a contradiction to \q (z)\ > 1. 
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C.l Exercises 24 

1 Let z = 5 + z9. Find z~ x . 



(5 + *9) X 



5 
106 



106 L 



2 Let z = 2 + i7 and let w = 3 — iS. Find zw, z + w, z 2 , 
and w/z. 



62 + 5i, 5 - z, -45 + 28i, and - jg - §^. 

4 Graph the complex cube roots of 8 in the complex 
plane. Do the same for the four fourth roots of 16. 

The cube roots are the solutions to z 3 + 8 = 0, 
Solution is: iy/3 + 1,1 — z\/3, —2 

The fourth roots are the solutions to z 4 + 16 = 0, 
Solution is: 

(1 - i) >/2, - (1 + z) V% - (1 - z) V% (1 + 

a/2. When you graph these, you will have three 
equally spaced points on the circle of radius 2 for 
the cube roots and you will have four equally spaced 
points on the circle of radius 2 for the fourth roots. 
Here are pictures which should result. 





5 If z is a complex number, show there exists uo a 
complex number with \uj\ = 1 and uoz = \z\ . 

z 



If z = 0, let u = 1. If z ^ 0, let u = 



*l 



Now using De Moivre's theorem, derive a formula 
for sin (5x) and one for cos (5x). 

sin (5x) = 5 cos 4 x sin x — 10 cos 2 x sin 3 x + sin 5 x 

cos (5x) = cos 5 x — 10 cos 3 x sin 2 x + 5 cos x sin 4 x 

9 Factor x 3 + 8 as a product of linear factors. 

x 3 + 8 = 0, Solution is: iy/3 + 1,1 — iy/S, —2 and so 
this polynomial equals 

(x + 2)(x- (z>/3 + l)) (a?- (l-zV3)) 

10 Write x 3 +27 in the form (x + 3) (x 2 + ax + 6) where 
x 2 + ax + 6 cannot be factored any more using only 
real numbers. 

x 3 + 27 = (x + 3) (x 2 - 3x + 9) 

12 Factor x 4 + 16 as the product of two quadratic poly- 
nomials each of which cannot be factored further 
without using complex numbers. 

x 4 + 16 = (x 2 - 2V2x + 4) (x 2 + 2y/2x + 4) . You 
can use the information in the preceding problem. 
Note that (x — z) (x — z) has real coefficients. 

13 If z,w are complex numbers prove z~w = z~w and 
then show by i nduction that z\ • • • z m = ~z~\ • • • ~z^. 
Also verify that Y^k=i z k — ^2^=1 ~%k- ^ n words this 
says the conjugate of a product equals the product 
of the conjugates and the conjugate of a sum equals 
the sum of the conjugates. 



7 You already know formulas for cos (x + y) and sin (x + y) 
and these were used to prove De Moivre's theorem. 



(a + ib) (c + id) = ac — bd + i (ad + be) = (ac — bd)- 
i (ad + be) 

(a — ib) (c — id) = ac — bd — i (ad + be) which is the 
same thing. Thus it holds for a product of two 
complex numbers. Now suppose you have that it is 
true for the product of n complex numbers. Then 

z i ' ' ' z n+l = 

Z\ • • • z n z n +i and now, by induction this equals 



zi ' 



• Z n Z n j r \ 
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As to sums, this is even easier. 



Y (%j + wj) = Y x i+ i Y y j 

j=l j=l j=l 

n n n n 

= Y X J ~ i Y y i = Y x i ~ iy i = Y ( x o + *%')■ 
j=l j=l j=l j=l 

14 Suppose p (x) = a n x n + a n _ix n_1 + • • • + ai# + ao 
where all the a& are real numbers. Suppose also 
that p(z) = for some zGC. Show it follows that 
p (z) — also. 

You just use the above problem. If p (z) — 0, then 
you have 



p (z) = = a n z n + a n _iz n x H h a\z + a 

= a n z n H- a n _iz n_1 H h aiz + a^ 



= a n 2 + a n _i z + • • • + a\ z + ao 

= a n z n + a n _iz n_1 H h aiz + a 

= jp(z) 

15 Show that 1 + z, 2 + z are the only two zeros to 

p 0) = x 2 - (3 + 2i) x + (1 + 3i) 

so the zeros do not necessarily come in conjugate 
pairs if the coefficients are not real. 

- (1 + i)) (x - (2 + i)) = x 2 - (3 + 2i) x + 1 + 3z 

16 I claim that 1 = — 1. Here is why. 

-i = e = V^iV^i = yj(-i) 2 = vT = i. 

This is clearly a remarkable result but is there some- 
thing wrong with it? If so, what is wrong? 

Something is wrong. There is no single \f^l. 



17 De Moivre's theorem is really a grand thing. I plan 
to use it now for rational exponents, not just inte- 
gers. 1 = l^ 1 / 4 ) = (cos 27r + i sin 2tt) ' = 
cos (tt/2) -\-i sin (tt/2) = i. Therefore, squaring both 
sides it follows 1 = —1 as in the previous problem. 
What does this tell you about De Moivre's theo- 
rem? Is there a profound difference between raising 
numbers to integer powers and raising numbers to 
non integer powers? 

It doesn't work. This is because there are four 
fourth roots of 1. 



18 Here is another question: If n is an integer, is it al- 
ways true that (cos# — ism9) n = cos (nO)— zsin {nO)l 
Explain. 

Yes, this is true. 

(cos# — is'm6) n = (cos(— 6) + isin(— 0)) n 
= cos (— nO) + zsin (— nO) 
= cos (nO) — i sin (nO) 

19 Suppose you have any polynomial in cos and sin 0. 
By this I mean an expression of the form 

2^=0 S/3=o a aP c °s a sin/3 where a a(3 E C. Can 
this always be written in the form 

E7=-(n+m) h l COS 7#+Eri-(n+m) C r sin T0? Explain. 

Yes it can. It follows from the identities for the sine 
and cosine of the sum and difference of angles that 

sin a sin b = - (cos (a — b) — cos (a + b)) 

cos a cos b = - (cos (a + b) + cos (a — b)) 

sin a cos b = - (sin (a + b) + sin (a — b)) 

Now cos = 1 cos + sin and sin = cos + 
lsin#. Suppose that whenever k < n, 



cos^ (<9) = ^ aj cos (j<9) + fy sin (j<9) 

j=-k 

for some numbers a^ , bj . Then 

n 

cos n+1 (0) = J2 a J cos ( <9 ) cos 0' <9 )+^" cos ( (9 ) sin o -<9 ) 

j=—n 

Now use the above identities to write all products as 
sums of sines and cosines of (j — I)0,j0,(j + 1)0. 
Then adjusting the constants, it follows 

n+l 

cos n+1 ((9) = J2 a 'j cos ( 6> ) cos 0' 6> )+ 6 i cos W sin 0' 61 ) 

You can do something similar with sin n (0) and with 
products of the form 

cos^su/fl. 

20 Suppose p (x) = a n x n + a n _ix n_1 + • • • + a\x + ao 
is a polynomial and it has n zeros, 

z\ , Z2 , • • • , 2 n 
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listed according to multiplicity, (z is a root of multi- C^ . 2 Ex€TClSeS 45 

plicity m if the polynomial / (x) = (x — z) m divides 
p (x) but (x — z) f (x) does not.) Show that 



1 50 



i+^Sj 



p (x) = a n (x- 2i ) (x-z 2 )'"(x- z n ) . 

p (x) — (x — z\) q (x)+r (x) where r (x) is a nonzero 
constant or equal to 0. However, r (z±) = and so 
r (x) = 0. Now do to q (x) what was done to p (x) 
and continue until the degree of the resulting q (x) 
equals 0. Then you have the above factorization. 

21 Give the solutions to the following quadratic equa- 
tions having real coefficients. 



(a) x 2 — 2x + 2 = 0, Solution is: 1 + i, 1 — i 

(b) 3x 2 +x+3 = 0, Solutionis: |z\/35— 1 : ' 



2 6 = 9.56 degrees. 



30(A 
V2j 



6' 



35- 



3 Will need 68. 966 gallons of gas. Therefore, it will 
not make it. 

4 At ( 155 75^ +150 ) 
= ( 155.0 279.9 ) 

5 It takes 2 hours 
13 miles. He ends up 1/3 mile down stream. 

7 ( §? "3 ) I n the second case, he could not do it. 

8 (-3,2,-5). 



6| 



(c) x 2 - 6x + 13 = 0, Solution is: 3 + 2z, 3 - 2i 

(d) x 2 +4x + 9 = 0, Solutionis: iVE-2, -iVE-2 

(e) 4x 2 + 4x + 5 = 0, Solution is: -\ + i, -| - i 



9 (3,0,0). 

10 ( 5^+!^ 

11 T = 50^/26. 



n 

2 



5^ 



22 Give the solutions to the following quadratic equa- 
tions having complex coefficients. Note how the so- 
lutions do not come in conjugate pairs as they do L^.O Jl/XGrClSGS 57 

when the equation has real coefficients. 



(a) x 2 + 2x + 1 + i = 0, Solution is : x = — 1 





\y/2-\iy/2, : 


r = -l-| 


y/2+\iy/2 




(b) 


4x 2 +4ix-5 = 

-l-U 


0, Solution is : x = 1 — \i 


, x = 


(c) 
(d) 


4x 2 + (4 + 4z) a: 

-^ x- - 1 - 

2 ' 2 

x 2 -4zx — 5 = 

l + 2z 


■ + 1 + 2i = 
i 

, Solution is 


0, Solution is 
i : x = — l+2i, 


: x = 
x = 


(e) 


3x 2 + (l-i)x 

-| + |vT9+(^ 


+ 3i = 0, 


Solution is : x = 



1 This formula says that 

u v = |u| |v|cos# where 6 is the included angle 
between the two vectors. Thus 



u-v 



U V COS( 



< u V 



and equality holds if and only if = or tt. This 
means that the two vectors either point in the same 
direction or opposite directions. Hence one is a mul- 
tiple of the other. 

3 -0.19739 = cos (9, Thus = 1.769 5 radians. 

4 -0.44444 = cos (9, 9 = 2. 031 3 radians. 



23 Prove the fundamental theorem of algebra for quadratic 
polynomials having coefficients in C. 

This is pretty easy because you can simply write 
the quadratic formula. Finding the square roots of 
complex numbers is easy from the above presenta- 
tion. Hence, every quadratic polynomial has two 
roots in C. Note that the two square roots in the 
quadratic formula are on opposite sides of the unit 
circle so one is —1 times the other. 



5 ^u 

uu 



7 ^u 

u u 



-5 

14 
_5_ 

14 



(1,2,3) 



15 
" 14 



(1,2, -2, !)•(!, 2, 3,0) 

1+4+9 



(1,2,3,0) 



( 



j_ 

" 14 



^- 

14 U 



8 It makes no sense. 

9 proj D (F) 



F D D 

|D| |D| 
D 



= (|F|cos#)^ = (|F|cos#)u 



11 40cos(^7r) 100 = 3758.8 
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13 20 (cos f)300 = 4242.6 

15 (-4, 3, -4) -(0,1,0) x 10 = 30 

17 (2, 3, -4)- (0,^,^)20 = -W2 

19 (1,2, 3, 4) -(2, 0,1, 3) = 17 

21 |a + b| 2 + |a-b| 2 = 

|a| 2 + |b| 2 + 2a-b+|a| 2 + |b| 2 -2a-b 
= 2|a| 2 + 2|b| 2 

C.4 Exercises 69 

1 If a 7^ , then the condition says that | a x u | = | a | sin t 
for all angles 0. Hence a = after all. 

3 |v / 374 

5 8^ 

7 113 

9 It means that if you place them so that they all have 
their tails at the same point, the three will lie in the 
same plane. 

12 axbxcis meaningless. 

13 (u x v) x w = (u • w) v— (v • w) u, 
ux (v x w) = (w • u) v— (v • u) w 

14 u- (z x w) v — v- (z x w) u 

= [u, z, w] v— [v, z, w] u 

15 [v, w,z] [u, v, w] 

16 

18 u • u = c, Therefore, u' • u + u • u' = so u' • u = 0. 

C.5 Exercises 91 



h] 



1 [*=!§'» 
3 [x = l,y = 0] 

5 [x=ly=\] 

6 No solution exists. 

8 It appears that the solution exists but is not unique. 

9 It appears that there is a unique solution. 

11 There might be a solution. If so, there are infinitely 
many. 



12 No. Consider x + y + z = 2 and x + y + z = 1. 

13 These can have a solution. For example, 

x + y = 1, 2x + 2y = 2, 3x + 3y = 3 even has an 
infinite set of solutions. 

14 ft = 4 

15 Any ft will work. 

16 Any ft will work. 

17 If ft 7^ 2 there will be a unique solution for any k. 
If ft = 2 and k ^ 4, there are no solutions. If ft = 2 
and k = 4, then there are infinitely many solutions. 

=19 There is no solution. 

20 [w=%y-l,x 



3 2^' Z 



22 s = t,y=f+tQ), 



x = ^ — ^t where £ G 



l 

2 2 L 



23 2 = £,y = 4t,a: = 2-4£. 

24 £5 = £, £4 = 1 — 6t, £3 = — 1 + 7t, 
£2 = 4t,£i = 3 — 9t 

25 £5 = t, £3 = 5. Then the other variables are given 





by 


X4. — 2 


3/ 

2 6 > 






x 2 


2 L 2' 1 


2 ^ 2 6 


25. 


26 


[x 


= l-2t,z = 


= i,y = *] 




27 


[x 


= 2-4t,j/ = 


= -8t, 2 = t] 




25 


[x 


= -1,2/ = 2, 


z = -l] 




29 


[x 


= 2,y = 4,* 


= 5] 




30 


[x 


= l,2/ = 2,z 


= -5] 




31 


[x 


= -l,2/ = - 


5,z = 4] 




32 


[x 


= 2t+l,j/ = 


--4£,z = t] 




33 


[x 


= l,y = 5,z 


= 3] 




34 


[x 


= 4,2/ = -4, 


z = -2] 





35 These are not legitimate row operations. They do 
not preserve the solution set of the system. 

36 {g = 60,1 = 90, b = 200, s = 50} 

37 [w = 15, x = 15, y = 20, z = 10] . 
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C.6 Exercises 117 

i ' " 3 ~ 6 ~ 9 



-6 -3 -21 ] ' 

8-5 3 

-li 5 -4 ; ' 



-3 3 4 

Not possible, ( ) , Not possible, Not 



i j k 

14 (] x u : uoi uj 2 (jOs 
Ui u 2 u 3 

= iuj 2 u 3 -1UJ3U2 -j^i^ 3 +jo;3^i + kwiu 2 -kuj 2 ui. 

In terms of matrices, this is 




. The matrix is of the form 



for suitable uja since it is 



Not possible, 9 



(-7 1 5), 



Not possible, 



-5 / ' 



1 3 
3 9 



7 



,10 



UJ 2 U 3 - ^3^2 

UJiU 2 - UJ 2 U\ 
—UJ3 UJ 2 

ous — uj\ 

—uj 2 uj\ 
skew symmetric. Now you note that multiplication 
on the left by this matrix gives the same thing as 
the cross product. 

15 -A = -A + (A + B) 

= (-A + A) + B = + B = B 

16 0' = 0' + = 0. 

17 (L4 = (0 + 0) A = 0A+0A. Now add - (0A) to both 
sides. Then = 0A 

18 A + (-1) A = (1 + (-1)) A = (L4 = 0. Therefore, 
from the uniqueness of the additive inverse, it fol- 
lows that -A = (-1) A. 



4 [ 5 ] , Not possible, ( -14 13 2 ), Not possible, 



19 UaA + PB) 

= (aA + PB)- = aAji 






11, 



-1 -3 
4 12 



-x -y 




= a(A T ).. + (j3B)i j = (aA T + l3B 

20 (U^E^.Ufc^^. 

24 Explain why if AB — AC and A~ l exists, then 
B = C. 

Because you can multiply on the left by A -1 . 



9 [fe = 4] 

10 There is no possible choice of k which will make 
these matrices commute. 

11 To get —A, just replace every entry of A with its 
additive inverse. The matrix is the one which has 
all zeros in it. 



28 



29 



30 



31 



12 A 



A+A T | A-A 1 



2 1 
-1 3 

1 
5 3 

2 1 

3 

2 1 

4 2 



= ( 1 2 

7 7 



_3 1 

5 5 

1 



does not exist. The row reduced ech- 



2 ' 2 
13 If A = —A T , then an = —an and so each an = 0. 



elon form of this matrix is 



- 1 - 2 
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32 



33 



34 



35 




assuming ad — be ^ of 



row echelon form: 



There is no inverse. 



36 



/-I 
3 

-1 
-2 



V 



1 


1 


1 


? 


\ 


2 5 


2 


2 


2 








1 


3 


1 


9 


4 


4 


4 



37 A 



( 1 -1 2 \ 

10 2 

3 

\ 1 3 3/ 



38 Write 



( x\+ 3x 2 + 2x 3 \ 

2x 3 + xi 

6x 3 

\ £4 + 3x 2 + #1 / 

where A is an appropriate matrix. 

/ 1 3 2 \ 

10 2 

6 

\ 1 3 1 / 



in the form A 



x 2 

x 3 

\ x 4 J 



39 A: 



/ 1 1 1 \ 

112 

-10 10 

\ 1 3/ 



42 



/I 
1 

2 

/I 

1 

2 

Vi 



2\ 



2 

2\ 



2 

2/ 



, Thus the solution is 



f a\ 

b 



( * \ 

y 

\w J 



\dj 



a + 2b + 2d \ 
a + b + 2c 
2a + b-3c + 2d 

\ a + 2b + c + 2d J 

43 Multiply on the left of both sides by A" 1 . 

44 Multiply on both sides on the left by A -1 . Thus 
= A~ 1 = 

A' 1 (Ax) = (A' 1 A) x = Jx = x 

45 A' 1 = A~ X I = A' 1 (AB) 
= (A' 1 A) B = IB = B. 

46 You just need to show that (A~ x ) acts like the 
inverse of A T because from uniqueness in the above 
problem, this will imply it is the inverse. But from 
properties of the transpose, 



A T (A- 1 ) T 
(A~ 1 ) T A T 



(A-iAY 

(AA-y 



I T 
I T 



I 
I 



Hence (A r ) = (A T ) and this last matrix ex- 
ists. 

47 (AB) B- X A~ X = A (BB' 1 ) A' 1 = AA' 1 = I 
B^A' 1 (AB) = B' 1 (A' 1 A) B 
= B- X IB = B- X B = I 

51 Ax-y=Z k (Ax) k y k 
= E/c Ei A ki x iyk = 

C.7 Exercises 141 

2 6 

3 2 

4 6 

5 -4 

6 -6 

7 -32 

8 63 

9 211 

11 It does not change the determinant. This was just 
taking the transpose. 
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13 The determinant is unchanged. It was just the first 
row added to the second. 

15 In this case the two columns were switched so the 
determinant of the second is —1 times the determi- 
nant of the first. 

17 det (aA) = det (alA) = det (al) det (A) = a n det (A) . 
The matrix which has a down the main diagonal has 
determinant equal to a n . 



19 This is not true at all. Consider A = 



-1 



1 
1 



,B = 







-1 



20 It must be because = det (0) = det (A k ) = 
(det(A)) k . 

21 det (A) = 1, or -1. 

23 If A = S^BS, then SAS' 1 = B and so if A ~ B, 
then B ~ A. It is obvious that A ~ A because 
you can let S = I. Say A ~ B and B ~ C. Then 
A = P~ X BP and B = Q~ X CQ. Therefore, 

A = P- X Q- X CQP = (QPy 1 C (QP) 

and so A~ C. 

25 Say M = S^NS. Then 

det (XI - M) 

= det (XI - S^NS) 

= det (XS^S-S^NS) 

= det (S' 1 (XI -TV) S) 

= det (S' 1 ) det (XI - N) det (S) 

= det (XI - N) det (S' 1 ) det (S) 

= det (XI - N) 



31 



39 Suppose A, B are n x n matrices and that AB = /. 
Show that then B A = I. Hint: You might do some- 
thing like this: First explain why det (A) , det (B) 
are both nonzero. Then (AB) A = A and then show 
BA (BA — I) = 0. From this use what is given to 
conclude A (BA - I) = 0. Then use Problem 38. 

You have 1 = det (A) det (B). Hence both A and B 
have inverses. Letting x be given, 

A (BA - I) x = (AB) Ax - Ax 
= Ak - Ak = 

and so it follows from the above problem that 

(BA-I)x = 0. 

Since x is arbitrary, it follows that BA = I. 



40 



41 



e~ l 

e~ l (cos t + sin t) — (sin t) e~ l 
—e~ l (cost — sin t) (cost)e _t 



2 e 







2 e 



| cos t-\-\ sin t — sin t \ sin t—\ cos t 
| sin t —\ cos £ cos £ — \ cos £ — | sin £ 



C.8 Exercises 202 



4 a. is not, b. is, and so is c. 



1 








n "l 


/I 






1 








1 





O o 








1 








1 


"2/ 


Vo 





0/ 


1 








°\ 











1 


1 

2 




















1 / 









1 





-3 


2 


1 


5 


3 


3 


3 


2 


1 


2 


3 


3 


3 


1 

2 


3 
2 


(M 


3 


9 


1 


1 

2 


I 

2 


oy 



7 It is because you cannot have more than min (m, n) 
nonzero rows in the row reduced echelon form. Re- 
call that the number of pivot columns is the same 
as the number of nonzero rows from the description 
of this row reduced echelon form. 



33 



38 Show that if det (A) ^ for A an n x n matrix, it 
follows that if Ax = 0, then x = 0. 

If det (A) 7^ 0, then A -1 exists and so you could 
multiply on both sides on the left by A -1 and obtain 
that x = 0. 




is a basis. 



is a basis. 



11 Yes. This is a subspace. It is closed with respect to 
vector addition and scalar multiplication. 

13 Yes, this is a subspace. 
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15 This is a subspace. 
17 Not a subspace. 

19 Eti 0x * = ° 

21 If AB = /, then B must be one to one. Otherwise 
there exists x/0 such that £?x = 0. But then you 
would have 

x = Jx = AByi = AO = 

In particular, the columns of B are linearly inde- 
pendent. Therefore, B is also onto. Also, 

(BA -I)Bx = B (AB) x-5x = 

Since B is onto, it follows that BA — I maps every 
vector to and so this matrix is 0. Thus BA = I. 

23 These vectors are not linearly independent. They 
are linearly dependent. In fact —1 times the first 
added to 2 times the second is the third. 

25 These cannot be linearly independent because there 
are 4 of them. You can have at most three. How- 
ever, it might be that they span R 3 . 

14 3 2 

2 3 1 4 I , row echelon form: 

3 3 6 

1 -1 2 \ 

1 1 I . The dimension of the span of 
0/ 
these vectors is 2 so they do not span R 3 . 

27 These vectors are not linearly independent and so 
they are not a basis. The remaining question is 
whether they span. 

14 12 

3 2 4 1, row echelon form: 

3 3 



33 It is a subspace and it equals the span of the vectors 
which form the columns of the following matrix. 

/ 2 1 \ 

, row echelon form: 




1 








1 

M 




0/ 



It follows that the dimension of 



is< 



this subspace equals 3. A basis 

f/0\ /2\ / 1\) 

1 3 

1 ' 1 ' 

I V o / W WJ 

37 This is obvious. If x, y 6 V fl W, then for scalars 
a, /?, the linear combination ax + /3y must be in 
both V and W since they are both subspaces. 

39 Let {#i, • • • , Xk} be a basis for V H W. Then there 
is a basis for V and W which are respectively 

{x lr -- ,x k ,yk+i,-' ,y P }, 



It follows that you must have 

k + p — k + q — k < n 
and so you must have 

p + q — n < k 

41 No. There must then be infinitely many solutions. 
If the system is Ax = b, then there are infinitely 
many solutions to Ax = and so the solutions to 
Ax = b are a particular solution to Ax = b added 
to the solutions to Ax = of which there are in- 
finitely many. 



1 \ 

1 .The dimension of the span of 43 Yes ' lt has a uni( l ue solution. 
12/ 
these vectors is 3 and so they do span R 3 . 



31 Yes it is. It is the span of the vectors 



1 




Since these two vectors are a linearly independent 
set, the given subspace has dimension 2. 



45 a. Infinite solution set. b. This surely can't happen. 
If you add in another column, the rank does not get 
smaller, c. You can't have the rank equal 4 if you 
only have two columns, d. In this case, there is no 
solution to the system of equations represented by 
the augmented matrix, e. In this case, there is a 
unique solution since the columns of A are indepen- 
dent. The columns are independent. Therefore, A is 
one to one. 
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49 Suppose ABx = 0. Then Bx € ker {A)C\B (¥ p ) and 
so 5x = z2i=i Bzi showing that 

k 

x-^z^G ker (B) . 

2=1 



Consider 5 (F p ) D ker (A) and let a basis be 
{wi,-" ,w fc }. 

Then each w^ is of the form £?z^ = w^. Therefore, 
{zi,--- ,z/e} is linearly independent and ABzi = 
0. Now let {ui, • • • , u r } be a basis for ker (B) . If 
ABx = 0, then 5x e ker (A) D B (¥ p ) and so 



which implies 



5x = y^CjBzj 



x— >_^ Qz^ G ker (B) 



and so it is of the form 

k 



2=1 j=i 

It follows that if A5x = so that x e ker (AS) , 
then 

x e span (zi, • • • , z fe , Ui, • • • , u r ) . 

Therefore, 

dim (ker (AB)) 

< k + r = dim(B (¥ p ) n ker (A)) 
+ dim (ker (5)) 

< dim (ker (A)) + dim (ker (B)) 

51 If det (A - XI) = then (A - A/) -1 does not ex- 
ist and so the columns are not independent which 
means that for some x / 0, (A - XI) x = 0. 

53 Since A\ is not one to one, it follows there exists 
x/0 such that A\X = 0. Hence Ax = although 
x 7^ so it follows that A is not one to one. From 
another point of view, if A were one to one, then 
ker (A) = R n and so by the Fredholm alternative, 
A T would be onto R n . However, A T has only m 
columns so this cannot take place. 



54 That (A T A) = A T A follows from the properties 
of the transpose. Therefore, 

(ker ((A T A) T )) X = (ker (A T A)) X 

Suppose A T Ax = 0. Then (A T Ax,x) = (Ax,Ax) 
and so Ax = 0. Therefore, 

(A T b,x) = (b,Ax) = (b,0) = 

It follows that A T b e (ker (( A T A) T J j and so 
there exists a solution x to the equation 

A T A^ = A T b 

by the Fredholm alternative. 

55 

|b-Ay| 2 
= |b-Ax+Ax-Ay| 2 
= |b-Ax| 2 + |Ax-Ay| 2 

+2(b-Ax,A(x-y)) 
= |b-Ax| 2 + |Ax-Ay| 2 

+2 (A T b-A T Ax, (x - y)) 
= |b-Ax| 2 + |Ax-Ay| 2 

and so, Ax is closest to b out of all vectors Ay. 

56 The dimension of F n is n 2 . Therefore, there exist 
scalars c& such that 



Y,c k A k = 

k=0 

Let p (A) be the monic polynomial having smallest 
degree such that p (A) = 0. If q (A) = then from 
the Euclidean algorithm, 

q(X)=p(X)l(X)+r(X) 

where the degree of r (A) is less than the degree of 
p (A) or else r (A) equals 0. However, if it is not zero, 
you could plug in A and obtain 

= q (A) = + r (A) 

and this would contradict the definition of p (A) as 
being the polynomial having smallest degree which 
sends A to 0. Hence q (A) = p (A) / (A) . 
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C.9 Exercises 225 



i 

2 

hV3 



iVs 



\yft 



2V - \V2 
IV2 \J2 



1 1 

2 2 

•W3 



>/3 



_1 

2 

iV3 



;V3 

' 1 
2 



i\/2V3+i\/2 i\/2-i\/2V3 



_ i 

1 
2 

4V3 



rV^ 



r\/3 
' 1 

"2 



;VS 



_1 
1^ 



16 



17 






IV3 

1 

2 





-I 

£V3 




21 



T u (av+6w) 

, (av+6w • u) 
= av+6w o u 

|u| 2 

(v • u) (w • u) 
= av — a ^— u + ow— 6 ^— u 

|u| 2 |u| 2 

= aT u (v) + bT u (w) 



25 
26 



2a6 



2a6 



(I - 2uu T ) (i - 2uu T ) 
= (I- 2uu T ) (I - 2uu T ) 



/ - 2uu T - 2uu T + 4uu T uu T 



Now, why does this matrix preserve distance? For 
short, call it Q and note that Q T Q = /. Then 

|x| 2 = (Q T Qx,x) = (Qx,Qx) = |Qx| 2 

and so Q preserves distances. 

27 From the above problem this preserves distances 
and Q T = Q. Now do it to x. 



x-y 



Q(x-y) 

x-y-2- 

|x-y| 

(x-y,x-y) 

y-x 



Q(x + y) 



(x-y) (x + y) 



= x+y-2 



x + y 



x-y 

|x-y| 2 



(|x| 2 -|y| 2 ) 



and so 



Qx - Qy= y-x 
Qx+Qy= x + y 

Hence, adding these yields Qx = y and then sub- 
tracting them gives Qy = x. 

28 Linear transformations take to 0. Also 
T a (u + v) ^ T a u + T a v. 
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38 



39 



( ° ^ 






-i 

-i 


, ieR 


{ i ) 


( ° \ / 2 \ 


-i 

-i 


+ 


-1 
-1 


\ i ) V J 


( ~i\ ( - 8 \ 


i 

i 


+ 


5 



K o J 




V 5 J 



41 



43 A basis is < 



48 



f/-M 




( - x \) 


i 




i 


2 


i 


> 


U o ) 




I 1 /J 



H-ii; 



/ -2 \ 

1/2 

1 



v o / 
-i 




V i ) 



t 



( -i \ 

-1/2 

1 

V o / 

/ 4 \ 

7/2 



V o / 



51 If x,y E ker(A) then 

A (ax+6y) = aix + 6Ay = aO + 60 = 

and so ker (A) is closed under linear combinations. 
Hence it is a subspace. 



CIO Exercises 244 




1-2-5 

-2 5 11 3 

3 -6 -15 1 




V i 



2 1 / \ / 



/-I 

1 3 

3 9 

\ 4 12 



3 -1\ 


16 J 



\ / -1 





( l ° 

-110 
-3010 

\ -4 -4 1 / 







V o 



-1\ 

-1 

-3 
J 



11 An LU factorization of the coefficient matrix is 




First solve 




which yields u = l,v = 2,w = 6. Next solve 
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This yields z = 6, y = — 16, x = 27. 



14 



20 



1 

1 

1 

1 



1 

1 1 

1 

1 



1 





0\ 







11 

n 



13 



11 



ii V- £Vn 

V2 



1 1/ 



66 vVn -IV2 



22 You would have QRx. = b and so then you would 
have Rx. = Q T h. Now R is upper triangular and so 
the solution of this problem is fairly simple. 

C.ll Exercises 276 

1 The minimum is —11/2 and it occurs when x\ = 

%3 — %6 — and X2 = 7/2, £4 = 13/2, #6 — —11/2. 

The maximum is 7 and it occurs when x\ = 7, x 2 = 
0, xs = 0, £4 = 3, £5 = 5, xq = 0. 

2 Maximize and minimize the following if possible. 
All variables are nonnegative. 



(a) The maximum is 7 when x\ = 7 and x\,x% = 
0. 

The minimum is —7 and it happens when x\ — 

0,x 2 = 7/2,x 3 =0. 

(b) The minimum is —21 and it occurs when x\ — 
x 2 = 0,x 3 = 7. 

The maximum is 7 and it occurs when x\ — 
7,x 2 = 0,x 3 = 0. 

(c) The minimum is and it occurs when x\ = 
x 2 = 0,x 3 = 1. 

The maximum is 14 and it happens when x\ — 
7,x 2 = x 3 = 0. 

(d) The maximum is 7 and it happens when x 2 = 
7/2,^3 = x\ — 0. 

The minimum is when x\ — x 2 — 0, X3 = 1. 

4 Find solutions if possible. 

(a) There is no solution to these inequalities with 
xi,x 2 > 0. 

(b) A solution is x\ = 8/5, x 2 = x% = 0. 

(c) No solution to these inequalities for which all 
the variables are nonnegative. 

(d) There is a solution when x 2 =2,0:3 = 0,a?i = 
0. 

(e) There is no solution. 



C.12 Exercises 308 

3 If it did have A G M as an eigenvalue, then there 
would exist a vector x such that Ax = Ax for A a 
real number. Therefore, Ax and x would need to 
be parallel. However, this doesn't happen because 
A rotates the vectors. 

5 A m x = A m x for any integer. In the case of —1, 

A _1 Ax = AA _1 x = x 

so A _1 x = A _1 x. Thus the eigenvalues of A -1 are 
just A - where A is an eigenvalue of A. 

7 Let x be the eigenvector. Then A m x = A m x, A m x = 
Ax = Ax and so 

A m = A 



Hence if A ^ 0, then 



A m-i = 1 



and so |A| = 1. 
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3 | [2 

9 eigenvectors: ( 1 > ^ 1, < 1 }> <-> 2. This is a 

defective matrix. 




This matrix is not defective because, even though 
A = 1 is a repeated eigenvalue, it has a 2 dimen- 
sional eigenspace. 




1 I > <->6 
This matrix is not defective. 




This matrix is defective. In this case, there is only 
one eigenvalue, —1 of multiplicity 3 but the dimen- 
sion of the eigenspace is only 2. 



f 



18 eigenvectors: ^ l , 
1 



i 



19 eigenvectors: ^ t , 
1 

This is defective. 



«->> This one is defective. 



** 1 




25 ei 




o 2 - 2i, 



z I > <H> 2 + 2i This matrix is not defective. 




o 2 - 6i, 



^2 + 6i 



This is not defective. 

27 The characteristic polynomial is of degree three and 
it has real coefficients. Therefore, there is a real 
root and two distinct complex roots. It follows that 
A cannot be defective because it has three distinct 
eigenvalues. 



29 eigenvectors: 



1(0} 



{(?)} 



<-> —i, 



<-> i 



31 eigenvectors: 




*+-l, 



33 eigenvectors: 



<->l, 



35 In terms of percentages in the various locations, 

21.429 
21.429 

57. 143 
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37 Obviously A cannot be onto because the range of 
A has dimension 1 and the dimension of this space 
should be 3 if the matrix is onto. Therefore, A can- 
not be invertible. Its row reduced echelon form can- 
not be / since if it were, A would be onto. Aw = w 
so it has an eigenvalue equal to 1. Now suppose 
Ax = Ax. Thus, from the Cauchy Schwarz inequal- 
ity, 



> 



|w| 
l(x,w)| 



|w| = |A||x| 



and so |A| < 1. 



39 Since the vectors are linearly independent, the ma- 
trix S has an inverse. Denoting this inverse by 



w?\ 



s- 1 



it follows by definition that 



w ; X J = s ij- 



Therefore, 



S^MS = S' 1 (Mxi, • • • , Mx n ) 



(AiXi, • • • , A n x n ) 



Ai 







A. 



46 Suppose A is skew symmetric. Then what about 
iA? 

(iA)* = -iA* = -iA T = iA 

and so iA is self adjoint. Hence it has all real eigen- 
values. Therefore, the eigenvalues of A are all of 
the form iX where A is real. Now what about the 
eigenvectors? You need 

Ax = iAx 

where A ^ is real and A is real. Then 

ARe(x) =zARe(x) 

The left has all real entries and the right has all 
pure imaginary entries. Hence Re (x) = and so x 
has all imaginary entries. 

C.13 Exercises 354 

1 a. orthogonal and transformation, b. symmetric, c. 
skew symmetric. 

4 ||C/x|| 2 = (C/x,[/x) 

= (C/ r C/ x ,x) = (/x,x) = ||x|| 2 

Next suppose distance is preserved by U. Then 

(C/(x + y),C/(x + y)) 
= \\Uxf + \\Uy\\ 2 + 2(Ux,Uy) 
= ||x|| 2 + ||y|| 2 + 2([/ T f/ a; ,y) 

But since U preserves distances, it is also the case 
that 

(f/(x + y),C/(x + y)) 
= ||x|| 2 + ||y|| 2 + 2(x,y) 



41 The diagonally dominant condition implies that none 
of the Gerschgorin disks contain 0. Therefore, is 
not an eigenvalue. Hence A is one to one, hence 
invertible. 

43 First note that (AB)* = 5* A*. Say Mx = Ax, x ^ 0. 
Then 



Hence 



and so 



(x,y) = (/7 T /7x,y) 
((U T U - I) x,y)=0 



Since y is arbitrary, it follows that U T U — 1 = 0. 
Thus U is orthogonal. 



A |x| = Ax*x=(Ax)*x 

= (Mx)*x = x*M*x 



x*Mx = x*Ax = Alxl 



Hence A = A. 



( x y z ) • 

a\ (14/2 as/2 \ / x 

a 4 /2 a 2 a 6 /2 y 

a 5 /2 a 6 /2 a 3 I \ z 
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7 If A is symmetric, then A = U T DU for some D a 
diagonal matrix in which all the diagonal entries are 
non zero. Hence A -1 = U~ 1 D~ 1 U~ T . Now 



u- 1 u- T 



(u T uy 

r l = i 



and so A -1 = QD~ 1 Q T , where Q is orthogonal. Is 
this thing on the right symmetric? Take its trans- 
pose. This is QD~ 1 Q T which is the same thing, so 
it appears that a symmetric matrix must have sym- 
metric inverse. Now consider raising it to a power. 

A k = U T D k U 

and the right side is clearly symmetric. 

9 Yes. 

11 eigenvectors: { ( ) } ++ c, 




12 eigenvectors: 



O a — ib, 



13 eigenvectors: 




**6, 



O 12, 



O 18 



15 



(CD) T =D T C T T 

= x Ak 

A is real t - \ — 
= X Ax 

A is eigenvalue ^T T" T 

— x Ax = Ax x 

and so A = A. This shows that all eigenvalues are 
real. It follows all the eigenvectors are real. Why? 

Because A is real. Ax = Ax, Ax. = Ax, so x+x is 
an eigenvector. Hence it can be assumed all eigen- 
vectors are real. 

Now let x, y, \i and A be given as above. 

A (x • y) = Ax • y 

= Ax. • y = x • Ay 

= x-fiy = ii (x • y) 

= M (x • y) 

and so 

(A-/i)x-y = 0. 

Since A 7^ /i, it follows x • y = 0. 



17 



A ^ A Hermitian ^ 

Ax • x = Ax • x — x-Ax 
= x-Ax 

rule for complex inner product -r- 

= Ax • x 

and so A = A. This shows that all eigenvalues are 
real. Now let x, y, /i and A be given as above. 

A (x • y) = Ax • y = Ax • y = 



x • Ay= x-/iy 

rule for complex inner product 



M (x • y) 



and so 



(A-/i)x-y = 0. 
Since A 7^ /i, it follows x • y = 0. 
1 3 



19 Certainly not. 



29 eigenvectors: 

-§V6 



2 



Ax T x=(,4x) x 



-gV6 I ?*+6, 

-IV2 
i^2 I Ul2, 
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\V3 

\V3 



++18. 



The matrix U has these as its columns. 



34 eigenvectors: 




<-»0, 



«1, 



<-> 2. The columns are these vectors. 



37 If A is given by the formula, then 

A T = U T D T U = U T DU = A 

Next suppose A = A T . Then by the theorems on 
symmetric matrices, there exists an orthogonal ma- 
trix U such that 

UAU T = D 

for D diagonal. Hence 

A = U T DU 

39 There exists U unitary such that A = U*TU such 
that T is uppser triangular. Thus A and T are 
similar. Hence they have the same determinant. 
Therefore, det (A) = det (T) , but det (T) equals the 
product of the entries on the main diagonal which 
are the eigenvalues of A. 



43 



44 y = -0.125x 2 + 1. 425x + 0.925 

46 Find an orthonormal basis for the spans of the fol- 
lowing sets of vectors. 

(a) (3, -4,0), (7, -1,0), (1,7,1). 
3/5 \ / 4/5 \ / 

(b) (3, 0,-4), (11, 0,2), (1,1, 7) 


1 





(c) (3, 0,-4), (5, 0,10), (-7, 1,1) 






47 




m^ 

1 



V2 



49 




^V5VT4 



51 It satisfies the properties of an inner product. Note 
that 



trace (AB*) = ^X^ fejB - 
i k 

= EE^ B 

k i 

= trace (BA* 



ik 



ik 



SO 



(A,B) F = (B,A) 1 



The product is obviously linear in the first argu- 
ment. If (A,A) F = 0, then 

^2^2A k A^=^2\A ik \ 2 = 



i k 



i . k 



52 From the singular value decomposition, 
WAV 
A 



a 




u[ I I W* 
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Then 



trace (A A*) 
- trace (?7( * ° ] V*. 



i/( I I Mr 



trace ( t/ | ff fl J | (/* 



trace 



<r 2 




E-, 2 



53 trace (AB) = £). ^ fc A ik B ki , 

trace (B A) = ^Zi^2 k BikAki. These give the same 
thing. Now 

trace (A) = trace (S~ x S5) 

= trace (^S^" 1 ) = trace (B) . 

C.14 Exercises 381 




5.3191 x 1(T 2 
2 I 7.446 8 x 1(T 2 
0.712 77 

0.143 94 

5 | 0.939 39 

0.280 3 



5 Eigenvalue near —1.35 : A = —1.341, 

1.0 
-0.456 06 
-0.476 32 

Eigenvalue near 1.5: A = 1.679 0, 

0.86741 

5.586 9 

-3.528 2 

Eigenvalue near 6.5: A = 6. 662 

4.405 2 
3.2136 
6.1717 

8 Eigenvalue near -1 : A - -0.703 69, 

3.374 9 
-1.265 3 
0.155 75 

Eigenvalue near .25 : A = 0.189 11, 

-0.242 20 

-0.522 91 

1.0 

Eigenvalue near 7.5 : A = 7. 514 6, 

0.346 92 

1.0 
0.606 92 



10 



A 



22 

y 



<\V2 



12 From the bottom line, a lower bound is 
the second line, an upper bound is 12. 



TO From 



0.205 21 

0.11726 

-2.605 9 x 10~ 2 

7 It indicates that they are no good for doing it. 



C.15 Exercises 414 

1 The actual largest eigenvalue is 8 with correspond- 
ing eigenvector 1 

V i 

4 The largest eigenvalue is —16 and an eigenvector is 
1 

-2 
1 



C.16 Exercises 420 

1 The hint is a good suggestion. Pick the first thing 
in S. By the Archimedean property, S ^ 0. That 
is km > n for all k sufficiently large. Call this first 
thing g+1. Thus n— (q + 1) m < but n — qm > 0. 
Then 

n — qm < m 



and so 



< r : 



qm < m. 



2 First note that either m or — m is in S so S is a 
nonempty set of positive integers. By well ordering, 
there is a smallest element of *S, called p = xqhtl + 
yon. Either p divides m or it does not. If p does not 
divide m, then by the above problem, 

m = pq + r 
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where < r < p. Thus 

m = (xom + yon) q + r 

and so, solving for r, 

r = m (1 — xq) + (—yoq) n E S. 

However, this is a contradiction because p was the 
smallest element of S. Thus p|ra. Similarly p|n.Now 
suppose q divides both m and n. Then m = qx and 
n = qy for integers, x and 2/. Therefore, 

p = mxo + ri2/o = %oq% + 2/o£2/ 
= g (x x + 2/0?/) 

showing g|p. Therefore, p = (m, n) . 

3 Suppose r is the greatest common divisor of p and 
m. Then if r ^ 1, it must equal p because it must 
divide p. Hence there exist integers x, y such that 

P = Xp + 2/771 

which requires that p must divide m which is as- 
sumed not to happen. Hence r — 1 and so the two 
numbers are relatively prime. 

4 The only substantive issue is why 7L V is a field. Let 
[x] E 7L V where [x] ^ [0]. Thus x is not a multiple 
of p. Then from the above problem, x and p are 
relatively prime. Hence from another of the above 
problems, there exist integers a, b such that 



1 = ap + bx 



Then 



[1 - bx] = [ap] = 
and it follows that 



[b] [x] = [1] 



so [b] 



C.17 Exercises 444 

1 No. (1, 0, 0, 0) E M but 10 (1, 0, 0, 0) $ M. 

3 If not, you could add in a vector not in their span 
and obtain 6 vectors which are linearly independent. 
This cannot occur thanks to the exchange theorem. 

10 For each x E [a, b] , let f x (x) = 1 and f x (y) = 
if 2/ 7^ x. Then these vectors are obviously linearly 
independent. 



12 A field also has multiplication. However, you can 
consider the elements of the field as vectors and then 
it satisfies all the vector space axioms. When you 
multiply a number (vector) in R by a scalar in Q 
you get something in R. All the axioms for a vector 
space are now obvious. For example, if a E Q and 
x,yeR, 

a(x + y) = ax + ay 

from the distributive law on R. 

13 Simply let / (i) be the i th component of a vector x E 
¥ n . Thus a typical thing in F n is (/ (1) , • • • , / (n)). 

0, the zero function. 



14 


Say for some 
Then pick z, 




l C-k^k 


= o, 


ti 






o = 


n 


i) 








fc=i 










= 


Ci&i I 


(») = 


Ci 



Since i was arbitrary, this shows these vectors are 
linearly independent. 



15 Say 



Z^ c kVk 

k=l 



Then taking derivatives you have 

n 

5> fc 2/i J) =0, j = 0,l,2--. ,n-l 
fc=i 

This must hold when each equation is evaluated at 
x where you can pick the x at which the above de- 
terminant is nonzero. Therefore, this is a system of 
n equations in n variables, the q and the coefficient 
matrix is invert ible. Therefore, each q = 0. 

19 Which are linearly independent? 

(a) These are linearly independent. 

(b) These are also linearly independent. 

21 This is obvious because when you add two of these 
you get one and when you multiply one of these by 
a scalar, you get another one. A basis is |l,-\/2}- 
By definition, the span of these gives the collection 
of vectors. Are they independent? Say a + by/2 = 
where a, b are rational numbers. If a ^ 0, then 
by/2 = —a which can't happen since a is rational. 
If b 7^ 0, then — a — b^J~2 which again can't happen 
because on the left is a rational number and on the 
right is an irrational. Hence both a, b = and so 
this is a basis. 
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29 Consider the claim about ma. 

le ln(a) + (-l)a( 







The equation shown does hold from the definition 
of In a. However, if ma were algebraic, then e lncr , e° 
would be linearly dependent with field of scalars 
equal to the algebraic numbers, contrary to the Lin- 
demann Weierstrass theorem. The other instances 
are similar. In the case of cos a, you could use the 
identity 



1 



1 



.o 



2 2 

contradicting independence of 



e cos a = 



e^e-^e . 






^2f(x k )g(x k ) 



k=0 



1/2 



< (Ei/(^)i 2 ) (Ei^)i 2 



u k w k 




where u = £) fe u k v k and w = ]T fc w k v k . 



y^^kh 



k=i 



1/2 



< E^i 2 El 6 * 



\k=l 



\k=l 



1/2 



1/2 



1/2 



C.18 Exercises 468 

1 I will show one of these. Verify that Examples 16.5.1 
- 16.5.4 are each inner product spaces. 

First consider Example 16.5.1. All of the axioms of 
the inner product are obvious except one, the one 
which says that if (/, /) = then / = 0. This one 
depends on continuity of the functions. Suppose 
then that it is not true. In other words, (/, /) = 
and yet / ^ 0. Then for some x E I, / (x) ^ 0. 
By continuity, there exists S > such that if y E 
I H (x — <5, x + S) = 1$, then 

1/ (y)-/ 0*01 < 1/0*01/2 

It follows that for y E Is, 

\f(y)\>\f(x)\-\f(x)/2\ = \f(x)\/2. 



Hence 

(fJ) > 

> 
> 



|/(2/) | p(x)dy 



is 



(|/(x)| 2 /2) (length of I s ) (mm (p)) 
0, 



a contradiction. Note that minp > because p 
is a continuous function defined on a closed and 
bounded interval and so it achieves its minimum by 
the extreme value theorem of calculus. 



/ f(x)g (x)p (x) dx 



< ( j i \f(x)\ z p(x)dx 



1/2 



/ \g(x)\ 2 p(x) 



dx 



1/2 



5 It might be the case that (z, z) =0 and yet z^O. 
Just let z = (zi, • • • ,z n ) where exactly p of the Z{ 
equal 1 but the remaining are equal to 0. Then 
(z, z) would reduce to in the integers mod p. An- 
other problem is the failure to have an order on Z p . 
Consider first Z2. Is 1 positive or negative? If it is 
positive, then 1 + 1 would need to be positive. But 
1 + 1 = in this case. If 1 is negative, then —1 
is positive, but —1 is equal to 1. Thus 1 would be 
both positive and negative. You can consider the 
general case where p > 2 also. Simply take a/1. 
If a is positive, then consider a,a 2 ,a 3 --- . These 
would all have to be positive. However, eventually 
a repeat will take place. Thus a n = a m m < n, and 
so a m (a k — l) = where k = n — m. Since a m ^ 0, 
it follows that a k = 1 for a suitable k. It follows 
that the sequence of powers of a must include each 
of {1,2,--- ,p — 1} and all these would therefore, 
be positive. However, 1 + (p — 1) = contradict- 
ing the assertion that 7L V can be ordered. So what 
would you mean by saying (z,z) > 0? The Cauchy 
Schwarz inequality would not even apply. 

7 In an inner product space, an open ball is the setLet 
5 = r - |z - x| . Then if y e B (z, S) , 



x 



< 



-z| + |z-x| < 5- 

|z - xl + |z - xl -- 



r 



x 



and so B (z, S) C B (x,r). 



( 



\V2 



\y/2y/Zz 



V^VEx 2 



9 Let y go with A and z go with \i. 

z (p (x) y') + (Xq (x) + r (x)) yz 
y (p (x) z')' + (fiq (x) + r (x)) zy 



VZV$ 
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Subtract. 

z (p (x) y')' -y(p (x) z')' + (A - /i) q (x) yz = 
Now integrate from a to b. First note that 



z (p (x) y') -y(p 0) z) = — (p (x) yz - p (x) z'y) 

and so what you get is 

p(b)y'(b)z(b)-p(b)z'(b)y(b) 

- (p 0) y' (a) z(a) -p (a) z' (a) y (a)) 



+ (A - /i) / q(x)y (x) z (x) dx = 

J a 







Look at the stuff on the top line. From the assump- 
tions on the boundary conditions, 

C iy (a) + C 2 y'(a) = 
dz (a) + C 2 z' (a) = 




-3-2-10123 



13 ELi^(-l)^os(te) + ^ 



and so 



Similarly, 



y (a) z' (a) - y' (a) z (a) = 



y(b)z , (b)-y , (b)z(b)=0 

Hence, that stuff on the top line equals zero and so 
the orthogonality condition holds. 

-,-, ^ 5 2(-l) fc+1 • /, x 

ii E k =ik — sin N 



3-2-10123 




15 IEr=i^y^l < (EILi^ 7 

16 



"^ (Ei=iK 



2 z) 1/2 



^/»)D B (t) = ^ ± 



i(fc+(l/2))i 



k=— n 



e«-WD n (t) = ^ E c^-P 



/2))t 



2tt 



n-l 



— V e i(/c+(1/2))t 



2tt 



/c=-( n +l) 



12 ^-ELo^)^ cos (( 2& + 1 ) a; ) 



Download free eBooks at bookboon.com 



D 



n it) (t 



,i(*/2) _ p -Kt/2) 



) 



1_ Li(n+(l/2))t _ e -i(n+(l/2))A 



538 



Elementary Linear Algebra 



Answers To Selected Exercises 



D n (t) 2i sin (t/2) = —2i sin (( n + - ) t 



D n (t) 



1 sin(t(n+ \)) 



2tt 



(10 



You know that t — » £> n (t) is periodic of period 2i\. 
Therefore, if / (y) = 1, 

£>„ (x-y)dy= / £>„ (t) dt 

-7T «^— 7T 

However, it follows directly from computation that 
S n f (x) = 1. Just take the integral of the sum which 
defines D n . 



17 From Lemma 16.5.11 and Theorem 16.5.12 
for all w G span ({u^}™ =1 ) . Therefore, 

n n 

|y I 2 = y- J2 ^ y ' u fc) Ufc + Yl ^ y ' u fc) UA 



fc=i 



fe=i 



Now if (u, v) = 0, then you can see right away from 
the definition that 



|u + v| 2 = |u| 2 + |v| 2 



Applying this to 

n 



k=i 



v =^2(y,u k )u k , 

fc=l 

the above equals 

n 

= y-^2(y^k) u k 
k=i 

n 

= y-^2(y^k) u k 



k=i 



^2(y^k) u k 

n 

^2\(y, u k)\ 2 , 



k=i 



the last step following because of similar reason- 
ing to the above and the assumption that the U& 
are orthonormal. It follows the sum YlkLi l(y> u k)\ 
converges and so lim^oo (y,u fc ) = because if a 
series converges, then the k th term must converge 
to 0. 



18 Let / be any piecewise continuous function which 
is bounded on [— it, tt] . Show, using the above prob- 
lem, that 

/ (t) sin (nt) dt 

-TT 

= lim / / (t) cos (ni)dt = 

Let the inner product space consist of piecewise con- 
tinuous bounded functions with the inner product 
defined by 



(f,9) = / f(x)g(x)dx 

J —TV 

Then, from the above problem and the fact shown 
earlier that \ —±= e lkx I form an orthonormal set 
of vectors in this inner product space, it follows that 

lim (f,e inx )=0 

n— ^oo 

without loss of generality, assume that / has real 
values. Then the above limit reduces to having both 
the real and imaginary parts converge to 0. This 
implies the thing which was desired. Note also that 
if a E [—1, 1] , then 

lim / / (t) sin ((n + a) t) dt = lim 

// (t) [sin (nt) cos a + cos (nt) sin a] dt = 
-IT 

19 From the definition of D n , 

S n f 0) = / f(x- y) D n (y) dy 

J —TT 

Now observe that D n is an even function. There- 
fore, the formula equals 



S n f(x) = / f(x-y)D n (y)dy 
h 

f(x-y) D n (y) dy 



f 



f(x-y) D n (y) dy 
o 

+ / f(x + y)D n (y) dy 
Jo 
n f(x + y) + f(x-y) 



1D n (y)dy 
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Now note that J* 2D n (y) = 1 because 
D n (y) dy = 1 



/: 



and D n is even. Therefore, 

f(x+) + f(x-) 



S n f(x) 



2D n (y) ■ 



f( X + y)-f( X+ ) + f( X -y)-f( X -) 



dy 



From the formula for D n (y) given earlier, this is 
dominated by an expression of the form 



C 



f 

Jo 



f(x + y)-f(x+) + f(x-y)-f(x-) 



sin (y/2) 
sin ((n+ 1/2) y)dy 

for a suitable constant C. The above is equal to 

c r y 

Jo sin (I) 
f{x + y)-f{x+) + f{x-y)-f{x-) 

y 

sin ((n + 1/2) y)dy 



and the expression 



equals a bounded con- 



sin(i//2) 

tinuous function on [0,7r] except at where it is 
undefined. This follows from elementary calculus. 
Therefore, changing the function at this single point 
does not change the integral and so we can consider 
this as a continuous bounded function defined on 
[0, 7r] . Also, from the assumptions on /, 



2/-> 



f{x + y)-f{x+) + f{x-y)-f{x-) 

y 



is equal to a piecewise continuous function on [0, tt] 
except at the point 0. Therefore, the above integral 
converges to by the previous problem. This shows 
that the Fourier series generally tries to converge to 
the midpoint of the jump. 



20 



7T 

4 



lim V^ 

n. — Voo ' J 



(-1) 



k+1 



n ->oo ^ — ' 2fc — 1 
k=l 



You could also find the Fourier series for x 2 instead 
of x and get 



A 2 

lim y^ — - (-1) cos (kx) + — - = x 2 



k=l 



because the periodic extension of this function is 
continuous. Let x = 



fe=l 



and so 



7T 

y 



limVA(-l) fc+1 



fc=l 



EsH) 



fc+1 



fc 2 

k=l 
This is one of those calculus problems where you 
show it converges absolutely by the comparison test 
with a p series. However, here is what it converges 
to. 

21 Consider for t G [0, 1] the following. 

|y-(x + r.(w-x))| 2 

where w G K and x G K. It equals 

/ (t) = |y - x| 2 + 1 2 |w - x| 2 - 2t Re (y - x, w - x) 

Suppose x is the point of K which is closest to y. 
Then f (0) > 0. However, 

/ / (0) = -2Re(y-x,w-x). 

Therefore, if x is closest to y, 

Re (y — x, w — x) < 0. 

Next suppose this condition holds. Then you have 

|y-(x + i(w-x))| 2 > 

|y-x| 2 + t 2 |w-x| 2 > |y-x| 2 

By convexity of K, a generic point of K is of the 
form x+t (w — x) for w € K. Hence x is the closest 
point. 



22 



|x + y| 2 + |x-y| 2 
|x| 2 + |y| 2 + 2Re(x,y) 
+ |x| 2 + |y| 2 -2Re(x )y ) 
2N 2 + 2|y| 2 



Of course the same reasoning yields 

l(|x + y| 2 -|x-y| 2 ) 



= i(|x| 2 + |y| 2 + 2(x,y) 
-(|x| 2 + |y| 2 -2(x,y))) 

= (x,y) 
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23 Let Xfc be a minimizing sequence. The connection 
between x& and c k G ¥ k is obvious because the {uk} 
are orthonormal. That is, 



\wp • 



where x = J2 7 - c j u j- Use the parallelogram identity. 



x fe - (y - X m) 



(y- x m) 



y- x /c 

2 


2 

+ 2 


y- x m 

2 



Hence 



■Xl. 



■Xfe| 
|2 



1 \ |2 

■o ly- x ml 



X/e ~r X^ 



< 



2 

|2 



x z 



Now the right hand side converges to since {x^} is 
a minimizing sequence. Therefore, {x^} is a Cauchy 
sequence in U. Hence the sequence of component 
vectors {c^} is a Cauchy sequence in F n and so 
it converges thanks to completeness of F. It follows 
that {x/c} also must converge to some x. Then since 
K is closed, it follows that xGK. Hence 



24 



A 



(Px-Py,y-Py) < 
(Py - Px, x - Px) < 



Thus 



(Px - Py, x - Px) > 
Hence 

(Px - Py,x - Px) - (Px - Py,y - Py) > 

and so 

(Px - Py,x - y - (Px - Py)) > 



|x-y||Px-Py| > 
(Px - Py,Px - Py) = 



|Px-Py|" 



25 Let {u/e}^ =1 be a basis for V and if x G V, let x^ 
be the components of x relative to this basis. Thus 
the Xi are defined according to 



£ 



Xi \li 



X 



Then decree that {u^} is an orthonormal basis. It 
follows 

|x| 2 = ^|^| 2 . 

i 

Now letting {x^} be a sequence of vectors of V let 
{x^} denote the sequence of component vectors in 
F n . One direction is easy, saying that ||x|| < A |x| . 
If this is not so, then there exists a sequence of 
vectors {x^} such that 

||Xfc|| > fe|x fc | 

dividing both sides by ||xfc|| it can be assumed that 
1 > k |xfc| = |x fc | . Hence x. k — >> in ¥ k . But from 
the triangle inequality, 



wi<E 



X, U 7 ; 



Therefore, since lim^oo x k = 0, this is a contradic- 
tion to each ||xfc|| = 1. It follows that there exists 
A such that for all x, 

l|x||<A|x| 

Now consider the other direction. If it is not true, 
then there exists a sequence {x^} such that 



k 



Xfc > ll x fc| 



Dividing both sides by |x^| , it can be assumed that 
|xfc| = \x. k \ = 1. Hence, by compactness of the 
closed unit ball in F n , there exists a further sub- 
sequence, still denoted by k such that x. k —> a E F n 



and it also follows that |a| FTl = 
inequality implies lim^oo ||x^ | 



1. Also the above 
= 0. Therefore, 



y^ djUj = lim S^ x k Uj = hm x& = 

which is a contradiction to the Uj being linearly 
independent. Therefore, there exists 5 > such 
that for all x, 

*|x|<||x||. 

Now if you have any other norm on this finite di- 
mensional vector space, say |||-||| , then from what 
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was just shown, there exist scalars Si and A^ all 
positive, such that 



*i|x| 

<5 2 |x| 



< 
< 



11*111 < A 2 |x| 



It follows that 



A 2 Ax 
Si 



< 



< 



A2 
Si 
A 2 Aa 

SiS 2 



x < 



< llxll < 



A! 



Hence 

— II 

In other words, any two norms on a finite dimen- 
sional vector space are equivalent norms. What this 
means is that every consideration which depends on 
analysis or topology is exactly the same for any two 
norms. What might change are geometric proper- 
ties of the norms. 

C.19 Exercises 500 



1 

1 

2 

hV3 



IV3 

1 

2 



Wz 

1 

2 



5V2 



hV2 



r V2\/3-i\/2 



-\V2VZ-\y/2 

l\/2-i\/2V3 



_i 

\3* 



\V3 

J 1 
2 



8 Let /e£(V,F). 

(a) If / = 0, the zero mapping, then /v 

for all v eV. 



(v,0) 



(v,0) = (v,0 + 0) = (v,0) + (v,0) 

so (v,0) =0. 

(b) If / 7^ then there exists z ^ satisfying 
(u, z) = for all u € ker (/) . 

ker (/) is a subspace and so there exists zi ^ 
ker (/) . Then there exists a closest point of 
ker (/) to zi called x. Then let z = zi — x. 
Thus (u, z) = for all u G ker (/). 

(c) / (/ (y) z - / (z) y) = / (y) / (z)-/ (z) / (y) = 
0. 



(d) 



(/(y)z-/(z)y,z) 
/(y)|z| 2 -/(z)(y,z) 



and so 



•(y) 



y,- 



/(*) 



so w 



: t^z appears to work. 



(e) If wi,W2 both work, then for every y, 

o = /(y)-/(y) = (y,w 1 )-(y,w 2 ) 

= (y,w 1 -w 2 ) 

Now let y = w 1 — w 2 and so wi = w 2 . 

9 It is required to show that A* is linear. 

(y,A*(az + /3w)) 
= (Ay.az + /3w) 
= a(Ay,z)+^(Ay,w) 
= a(y,A*z)+^(y,A*w) 
= (y,ai*z) + <y,/M*w) 
= (y,aA*z + /M*w) 

Since y is arbitrary, this shows that A* is linear. In 
case A is an m x n matrix as described, 



A* = (A T ) 

11 The two operators D + 1 and D + 4 commute and 
are each one to one on the kernel of the other. Also, 
it is obvious that ker (D + a) consists of functions 
of the form Ce~ at . Therefore, ker (D + 1) (D + 4) 
consists of functions of the form 



V = Cie~ 



C< 



2 e 



where Ci, C2 are arbitrary constants. In other words, 
a basis for ker (25 + 1) (D + 4) is {e~\ e~ 4t } . 

14 

/ 1 2 2 \ 

14 6 

16 

\ 1 J 

17 It is obvious that x ~ x. If x ~ y, then y ~ x is 
also clear. If x ~ y and y ~ z, then 
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and by assumption, both z — y and y-xG ker (L) 
which is a subspace. Therefore, z — x G ker (L) 
also and so ~ is an equivalence relation. Are the 
operations well defined? If [x] = [x'] , [y] = [y ; ] , 
is it true that [x + y] = [y' + x']? Of course, x' + 
y' - (x + y) = (x' - x) + (y' - y) e ker (L) because 
ker (L) is a subspace. Similar reasoning applies to 
the case of scalar multiplication. Now why is A well 
defined? If [x] = [x ; ] , is Lx = Lx'? Of course this 
is so. x — x' G ker (L) by assumption. Therefore, 
Lx = Lx'. It is clear also that A is linear. If A [x] = 
0, then Lx = and so x G ker (L) and so [x] = 0. 
Therefore, A is one to one. It is obviously onto 
L (V) = W. 

19 An easy way to do this is to "unravel" the powers of 
the matrix making vectors in F n and then making 
these the columns of a n 2 x n matrix. Look for linear 
relationships between the columns by obtaining the 
row reduced echelon form and using Lemma 8.2.5. 
As an example, consider the following matrix. 



for linear relationships. 




Lets find its minimal polynomial. We have the fol- 
lowing powers 




By the Cayley Hamilton theorem, I won't need to 
consider any higher powers than this. Now I will 
unravel each and make them the columns of a ma- 
trix. 



/ 



V 



1 


1 





-3 





1 


1 


-1 








-1 


-4 





-1 


-3 


-7 


1 





-2 


-6 





-1 


-3 


-7 





2 


7 


18 





1 


5 


15 


1 


3 


8 


19 



\ 



/ 



Next you can do row operations and obtain the row 
reduced echelon form for this matrix and then look 



( 1 








2 \ 





1 





-5 








1 


4 






























































K® 








o / 



From this and Lemma 8.2.5, you see that for A de- 
noting the matrix, 

A 3 = 4A 2 - 5 A + 21 

and so the minimal polynomial is 

A 3 - 4A 2 + 5A - 2 

No smaller degree polynomial can work either. Since 
it is of degree 3, this is also the characteristic poly- 
nomial. Note how we got this without expanding 
any determinants or solving any polynomial equa- 
tions. If you factor this polynomial, you get 

A 3 - 4A 2 + 5A - 2 = (A - 2) (A - l) 2 so this is an 
easy problem, but you see that this procedure for 
finding the minimal polynomial will work even when 
you can't factor the characteristic polynomial. 

20 If two matrices are similar, then they must have the 
same minimal polynomial. This is obvious from the 
fact that for p (A) any polynomial and A = S~ 1 BS, 

p{A) = S- 1 p{B)S 

So what is the minimal polynomial of the diagonal 
matrix shown? It is obviously nl=i (^ — ^i) • Thus 
there are no repeated roots. 

21 Show that if A is an n x n matrix and the minimal 
polynomial has no repeated roots, then A is non 
defective and there exists a basis of eigenvectors. 
Thus, from the above problem, a matrix may be 
diagonalized if and only if its minimal polynomial 
has no repeated roots. It turns out this condition 
is something which is relatively easy to determine. 
Hint: You might want to use Theorem 17.3.1. 

If A has a minimal polynomial which has no re- 
peated roots, say 



p(\) = l[(\-X i ). 
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then from the material on decomposing into direct 
sums of generalized eigenspaces, you have 



F n = ker(A-Ai/)©ker(A 
• • • © ker (A - \ m I) 



A 2 I) 



and by definition, the basis vectors for ker (A — \ 2 I) 
are all eigenvectors. Thus F n has a basis of eigenvec- 
tors and is therefore diagonalizable or non defective. 
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Index 



n, 11 
u, 11 

a (A), 483 

Abel's formula, 147 
Abelian group, 419 
adjoint, 338, 343 
adjugate, 134, 161 
algebraic multiplicity, 286 
algebraic number 

minimal polynomial, 442 
algebraic numbers, 441 

field, 443 
angle between vectors, 49 
area 

parallelogram, 63 
area of a parallelogram, 60 
augmented matrix, 79 
axioms for a norm, 56 

back substitution, 79 
barallelepiped 

volume, 64 
bases, 189 

basic feasible solution, 251 
basic variables, 88, 251 
basis, 189, 421 

any two same size, 425 
basis of eigenvectors 

diagonalizable, 292 
bijective, 475 
block matrix, 332 
block multiplication, 331 
box product, 65 

Cartesian coordinates, 29 
Cauchy Schwarz inequality, 47, 55 
Cayley Hamilton theorem, 162, 315, 363 
characteristic equation, 279 
characteristic polynomial, 162 
characteristic value, 279 
classical adjoint, 134 
codomain, 13 
cofactor, 128, 158 



cofactor matrix, 128 
column rank, 178 
column space, 178 
companion matrix, 399, 415 
complex conjugate, 18 
complex eigenvalues, 299 

shifted inverse power method, 397 
complex numbers, 17 
complex numbers 

arithmetic, 17 

roots, 21 

triangle inequality, 18 
component, 42, 57 
component of a force, 52 
component of force, 52 
components of a matrix, 98 
components of a vector, 30 
composition of linear transformations, 497 
condition number, 381 
conformable, 103 
consistent, 90 
Coordinates, 27 
Cramer's rule, 138, 161 
cross product, 59, 60 

area of parallelogram, 60 

coordinate description, 61 

distributive law, 63, 66 

geometric description, 60 
cross product 

coordinate description, 61 

distributive law, 63 

geometric description, 60 

parallelepiped, 64 

De Moivre's theorem, 21 
defective, 287 
defective eigenvalue, 287 
derivative, 384 
determinant, 152 

alternating property, 154 

cofactor, 126 

cofactor expansion, 157 

expanding along row or column, 126 

expansion along row (column), 158 
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linear transformation, 495 

matrix inverse formula, 134, 159 

minor, 125 

product, 131, 156, 157 

product of eigenvalues, 361 

row operations, 130 

transpose, 154 
determinant rank 

row rank, 199 
diagonal matrix, 292, 312 
diagonalizable, 291, 292, 312, 329, 494 
differential equations 

first order systems, 449 
dimension, 189 

dimension of vector space, 426 
direct sum, 479 
distance formula, 33 
Dolittle's method, 234 
domain, 13 
dot product, 47 

properties, 47 
dynamical system, 313 

echelon form, 80 
eigenspace, 281 
eigenvalue, 279, 483 

existence, 481 
eigenvalues, 162 
eigenvector, 279 

Einstein summation convention, 67 
elementary matrices, 165 
elementary matrix 

inverse, 171 

properties, 171 
elementary operations, 76 
empty set, 12 
entries of a matrix, 98 
equivalence class, 435, 492 
equivalence relation, 435, 492 
exchange theorem, 421 

field extensions, 439 

Field of scalars, 419 

force, 39 

Fourier coefficients, 462 

Fredholm alternative, 197, 198, 345 

free variables, 88, 251 

Frobinius norm, 362 

function, 13 

functions, 13 

fundamental theorem of algebra, 22, 515 

Gauss Elimination, 90 



Gauss elimination, 79, 80 

Gauss Jordan method for inverses, 113 

Gauss Seidel, 371 

Gauss Seidel method, 371 

general solution, 224 

solution space, 222 
generalized eigenspace, 483 

direct sum, 481 
geometric multiplicity, 287 
Gerschgorin's theorem, 307 
Gram Schmidt process, 336, 456 
Grammian matrix, 458 
greatest common divisor, 429 

Hermitian, 339 

homogeneous coordinates, 227 
homogeneous syster, 222 
homomorphism, 211 
Householder matrix, 240 
householder matrix, 226 

inconsistent, 87, 90 
independent set 

extending to a basis, 193 
independent set of vectors 

extending to form a basis, 192 
injective, 13, 475 
inner produc 

strange examplet, 57 
inner product, 47, 54 

axioms, 451 

Cauchy Schwarz inequality, 454 
inner product 

properties, 55 
integers mod a prime, 420 
intersection, 11 
intervals 

notation, 11 
inverse 

left inverse, 161 

right inverse, 161 
inverses and determinants, 137, 159 
invert ible, 111 
irreducible, 429 
isomorphism, 211 

Jacobi, 368 
Jacobi method, 368 
Jordan block, 505 
joule, 53 

ker, 222 
kernel, 193, 222 
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Kroneker delta, 67 
Kroneker symbol, 111 

Laplace expansion, 128, 157 
leading entry, 80 
least square approximation, 341 
linear combination, 155, 173 
linear independence 

enlarging to form a basis, 426 

equivalent conditions, 185 
linear relationships, 174 

finding them, 186 
linear transformation, 211, 476 

matrix, 214, 221 
linear transformations 

commuting, 479 

dimension, 477 
linear trnsformation 

rotation, 212 
linearly dependent, 421 
linearly independent, 182, 421 
linearly independent sets, 188 
LU deomposition 

non existence, 231 
LU factorization 

by inspection, 232 

justification, 236 

multipliers, 233 

solving systems, 234 

main diagonal, 129 
Markov matrices, 302 
matrices 

more columns than rows, 176 

multiplication, 103 

one to one, onto, 200 

similar, 291 
matrix, 97 

composition of linear transformations, 497 

identity, 110 

inverse, 111 

invertible, product of elementary matrices, 201 

left inverse, 161 

left inverse, right inverse, 220 

lower triangular, 129, 161 

main diagonal, 292 

one to one, onto, 220 

raising to a power, 294 

right inverse, 161 

right inverse left inverse and inverse, 177 

rotation, 215 

rotation about given vector, 217 

self adjoint, 313, 357 



symmetric, 313 

transpose, 108 

upper triangular, 129, 161 
matrix exponential, 297 
matrix inverse 

finding it, 113 
matrix multiplication 

ij entry, 105 

properties, 107 
matrix of linear transformation, 489 
mean square approximation, 463 
migration matrix, 302 
minimal polynomial, 210, 482 

computation, 503 

uniqueness, 482 
minimization and orthogonality, 460 
minor, 128, 158 
monic, 429 

monic polynomial, 482 
multipliers, 237 

Newton, 43 
nilpotent, 143, 488 
non defective 

minimal polynomial, 503 
nondefective, 329 
nondefective eigenvalue, 287 
normed linear space, 455 
normed vector space, 455 
null space, 193, 222 
nullity, 195 

one to one, 13, 214 

rank, 208 
onto, 13, 214 
open ball, 33 
operator norm, 377 
orthogonal matrix, 143, 226, 240, 318 

switching two unit vectors, 241 
orthogonal projection, 461 
orthogonality and minimization, 342 
orthonormal, 319, 335 

independent, 455 
orthonormal set, 455 

p norms, 383 
parallelepiped, 64 
parallelogram identity, 473 
particular solution, 222 
partitioned matrix, 332 
permutation matrices, 165 
permutation symbol, 67 
reduction identity, 68 



Download free eBooks at bookboon.com 



547 



Elementary Linear Algebra 



Answers To Selected Exercises 



perp, 197 

perpendicular, 51 

pivot, 87 

pivot column, 81, 174 

pivot columns, 81 

pivot position, 81 

pivot positions, 81 

PLU factorization, 238 

points and vectors, 27 

polar form complex number, 20 

polarization identity, 473 

polynomial, 428 

degree, 428 

divides, 429 

equal, 428 

Euclidean algorithm, 428 

greatest common divisor, 429 

greatest common divisor description, 430 

greatest common divisor, uniqueness, 429 

irreducible, 429 

irreducible factorization, 431 

relatively prime, 429 

root, 428 
polynomials 

canceling, 431 

factorization, 432 
position vector, 30, 31, 40 
power method, 385 
preserving distance, 347 
principle directions, 300 
product of matrices 

composition of linear transformations, 497 
projection, 52, 124 
projection of a vector, 52 
projections 

matrix, 219 

QR decomposition, 336 
QR factorization, 240, 241, 336 
thin, 336 

range, 13 
rank 

column determinant and row, 179 

finding the rank, 181 

linear transformation, 496 
rank and singular values, 351 
rank of a matrix, 178, 198 
Rayleigh quotient, 399 
reflection 

across a given vector, 226 
regression line, 344 
regular Sturm Liouville problem, 469 



relations 

graph, 14 
resultant, 42 
right handed system, 59 
right inverse, 113 
right polar factorization, 346 
rotations 

about given vector, 498 
row equivalent, 175 
row operations, 80, 130, 165 
row rank, 178 
row reduced echelon form, 80, 173 

existence, 173 

uniqueness, 175 
row space, 178 

scalar product, 47 

scalars, 29, 97 

scaling factor, 385 

Schur's theorem, 338 

set notation, 11 

shifted inverse power method, 388 

complex eigenvalues, 397 
sign function, 149 
similar matrices, 492 
similarity 

block diagonal matrix, 483 

upper triangular block diagonal, 486 
similarity and equivalence, 492 
similarity relation, 291 
similarity transformation, 492 
simplex tableau, 250, 252 
simultaneous corrections, 368 
singular value decomposition, 350 
singular values, 350 
skew lines, 75 
skew symmetric, 109, 119 
slack variables, 250, 253 
slope, 15 

solution space, 222 
span, 155, 173, 421 
spanning sets, 188 
spectrum, 279 
speed, 43 

standard position, 40 
strictly upper triangular, 505 
Sturm Liouville problem, 469 
subspace, 186, 421 

has a basis, 427 
surjective, 13, 475 
Sylvester's theorem, 478 
symmetric, 109, 119 
symmetric matrix, 320 
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trace, 363 

sum of eigenvalues, 363 
triangle inequality, 36, 48, 56, 455 

complex numbers, 18 

union, 11 

unitary, 338 

upper Hessenberg form, 407 

variation of constants formula, 451 
variational inequality, 473 
vector addition 

geometric meaning, 31 
vector space, 419 

dimension, 426 
vector space axioms, 29 
vectors, 27, 39, 100 

column vector, 100 

row vector, 100 
velocity, 43 

work, 52 

Wronskian, 147, 458 
Wronskian alternative, 450 

zero matrix, 99 
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